Machine Learning in Survey Research: Modeling Nonresponse and Completion Conditions from a Prediction Perspective

Christoph Kern

Open Conference Systems, ITACOSM 2019 - Survey and Data Science

Christoph Kern

Building: Learning Center Morgagni
Room: Aula 209
Date: 2019-06-06 09:00 AM – 10:30 AM
Last modified: 2019-05-06

Abstract

Advances in the field of machine learning created an array of flexible methods for exploring and analyzing diverse data. These methods often do not require prior knowledge about the functional form of the relationship between the outcome and its predictors while focusing specifically on prediction performance. Machine learning tools thereby offer promising advantages for survey researchers to tackle emerging challenges in data analysis and collection.

This presentation features two examples of utilizing prediction methods in survey research. First, the usage of machine learning for predicting nonresponse in panel studies will be discussed. This study investigates the potential of moving from post- to pre-correction of nonresponse in panel surveys by predicting dropouts in advance. With respect to model building, information from multiple panel waves are utilized by introducing features that aggregate previous (non)response patterns. Concerning model tuning and evaluation, temporal cross-validation is employed in order to account for the longitudinal data structure. Results based on data from the GESIS Panel indicate that promising prediction performance can be achieved over multiple panel waves.

Second, the potential of machine learning for predicting completion conditions in mobile web surveys will be highlighted. In this study, prediction models are trained based on acceleration data of smartphone respondents that were collected in a lab experiment which systematically varied the completion conditions (e.g., standing or walking). The evaluation results indicate that the trained models can be used to precisely predict completion conditions in mobile web surveys that collect acceleration data. This approach thereby allows to compare response behaviors between groups with different (predicted) completion conditions.

Full Text: SLIDES