Quality issues in multisource statistics

Giovanna Brancato

Open Conference Systems, ITACOSM 2019 - Survey and Data Science

Giovanna Brancato

Building: Learning Center Morgagni
Room: Aula 209
Date: 2019-06-07 12:00 PM – 01:30 PM
Last modified: 2019-05-06

Abstract

Nowadays, the need to reduce the statistical burden and the availability of administrative sources has led to an increase of the statistical production based on these sources. However, often administrative sources do not sufficiently represent the target population and measures, and there is the need to have complementary surveys, carried out in a multisource design.

In this context, by multisource statistics it is intended the use of two or more data sources for direct estimation, i.e. direct tabulation or substitution and supplementation for direct collection (SN-MIAD, 2014). The source data can range from survey data (sample or census), administrative or any other kind of data obtained from public or private data owners. A wider interpretation of multisource statistics, may include the use of sources additional to survey data for indirect purposes, e.g. for assisting sampling design and estimation. This interpretation is not adopted in this work.

It is customary to refer to statistics quality in terms of the widely known output quality dimensions (relevance, accuracy and reliability, coherence and comparability, timeliness and punctuality, accessibility and clarity). â€œOutput quality refers to the final statistical product and should provide the user with easy to understand information on the quality of final dataâ€ (AgafiÅ£ei et al., 2015). With respect to accuracy, no matter how many sources are being integrated, the quality of the final estimates will be reflected in the mean squared error (MSE), taking into account the various sources of errors. As pointed out by Lyberg and Stukel (2017), usually, only the variance and bias components relative to the predominant sources of errors are considered, one at time. The work of the statistical offices is more oriented to develop methods for the reduction of specific sources of errors and no real attempt to estimate the total MSE is made. Although challenging, it is vital that quality methods keep pace with these developments, working on sources of errors.

Given the above reported premises, this paper will deal with quality assurance, i.e. on how to reduce errors, and quality assessment in multisource statistics. The focus will be only on the most significant quality dimensions. A review of some noteworthy findings from the research in this field will be reported, also considering the experiences from the European ESSnet Komuso, Quality in multisource statistics. The challenges in this area will be highlighted.

AgafiÅ£ei M., Gras F., Kloek W., Reis F., VÃ¢ju S. (2015). Measuring output quality for multisource statistics in official statistics: some directions, Statistical journal of the IAOS 31, 203-211.

Lyberg L.E., Stukel D.M. (2017) The roots and evolution of the Total Survey Error concept. In â€œTotal Survey Error in Practiceâ€, editors P.P. Biemer, de Leeuw E., Eckman S., Edwards B., Kreuter F., Lyberg L.E., Tucker C.N., and Brady T.West, Wiley series in Survey Methodology, Hoboken, New Jersey.

Statistical Network Methodologies for an Integrated use of Administrative Data in the statistical process SN-MIAD (2014) Output A.1. Usage of Administrative Data Sources for Statistical Purposes https://ec.europa.eu/eurostat/cros/system/files/Usage%20of%20Administrative%20Data%20Sources%20for%20Statistical%20Purposes.pdf.

Full Text: SLIDES