USING ML TECHNIQUES FOR ESTIMATION WITH NON-PROBABILISTIC SURVEY DATA

Jorge Luis Rueda Sánchez; Maria del Mar Rueda García; Ramón Ferri García; Beatriz Cobo Rodríguez

Open Conference Systems, CLADAG2023

Jorge Luis Rueda Sánchez, Maria del Mar Rueda García, Ramón Ferri García, Beatriz Cobo Rodríguez

Last modified: 2023-07-07

Abstract

Online surveys, despite their cost and effort advantages, are particularlyprone to selection bias due to the differences between target population and potentiallycovered population. Some techniques have arisen in the last years regarding this issue.Propensity Score Adjustment, kernel weighting, Statistical Matching (or mass imputation),double robust estimation and superpopulation modeling are relevant techniquesto mitigate selection bias. These techniques use the sample to train a model capturingthe behaviour of a target variable which is to be estimated, or the propensity of theunits to participate in the volunteer sample. The modeling step has been usually donewith linear regression, but machine learning (ML) algorithms have been pointed outas promising alternatives. In this study we examine the use of these algorithms in thenonprobability survey context, in order to evaluate and compare their performance andadequacy to the problem.