Open Conference Systems, ITACOSM 2019 - Survey and Data Science

Font Size: 
Beyond the sampling errors: the effects of Centralised data collection on Total Survey Errors
Pasquale Papa

Building: Learning Center Morgagni
Room: Aula 209
Date: 2019-06-06 03:30 PM – 04:40 PM
Last modified: 2019-05-23


Italian National Statistical Institute during 2016 started a reorganization process whose main objective was to enrich the supply and the quality of the information produced, improving the effectiveness and efficiency of the statistical processes. The deriving new organizational set-up was characterized by the centralisation of all the support services, that were separated from thematic statistical production. The introduction of a specialist data collection, led to the review of the organizational structure of data collection processes and the redesign of many of the management procedures adopted.

The conceptual framework of ‘‘Total survey error’’ (TSE) is aimed at enhancing surveys errors beyond those of sampling (Groves, Lyberg, 2011). In the framework of the TSE can be also included the unit error that is identified by discrepancies in identification, characterisation and delineation of the relevant statistical units.


TSE identifies two major divisions based on variance and bias on one hand and errors of observation and non-observation on the other. Errors of non-observation normally include coverage errors and both unit and item non response. All these kinds of errors and notably the unit ones can be reduced by a Centralized Data Collection (CDC) approach. Errors of observation concern difference between a “true†value of a survey variable and reported or recorded values. These differences can be caused e.g. by the data collection mode adopted, questionnaire, respondents, interviewers. A CDC approach can affect all those aspects, optimizing DC modes, harmonizing procedures, and consequently having a positive impact on TSE.


The purpose of this paper is to demonstrate that CDC involves greater control over the TSE of the surveys. The effect of the TSE reduction on the statistical output can be evaluated as a part of the different quality dimensions: Relevance, Coherence, Accuracy, Timeliness, Comparability, Accessibility. Most of these dimensions are expressed by “non-statistical†indicators.

As one of the major limitations of the TSE framework is that of lacking of misurable components, in the paper the attention will be focused on several examples that support the hypothesis of a positive impact of CDC both on TSE and on “non statistical†quality components. The examples enhance how the CDC facilitates the decomposition of errors and the interdisciplinary dialogue among thematic experts and statisticians in order to analyses the causes of TSE. Moreover it allows to reduce the great burden involved by measurement of some TSE components.

Full Text: SLIDES