Open Conference Systems, ITACOSM 2019 - Survey and Data Science

Font Size: 
Total Error Frameworks for Integrating Probability and Nonprobability Data
Paul P. Biemer

Building: Learning Center Morgagni
Room: Aula Magna 327
Date: 2019-06-07 10:30 AM – 11:30 AM
Last modified: 2019-05-06


The survey world is relying more heavily on administrative data and other “found†data for inference and decision making rather than survey or “design†data. Found data are data that are not primarily collected for statistical purposes but contain information that might be useful for inference. The data become “found†when they are used to achieve some statistical purpose through data mining or analysis. Typically, found data sets are nonprobability samples from some ill-defined universe. This paper focuses on the accuracy of data produced by integrating two or more datasets, particularly when one of those datasets is from a survey and the other is found. First it considers the processes by which two or more data sources are integrated as well as the processes by which (hybrid) estimates are derived from integrated datasets. It then reviews several total error frameworks that have been proposed for evaluating the quality of the integrated dataset itself and several related frameworks for evaluating the quality of the hybrid estimates that may be produced from such datasets. The application of a total error framework is illustrated with an administrative data set that is currently being integrated with a national survey data set.

Full Text: SLIDES