Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
IT Solutions for Analyzing Large-Scale Statistical Datasets: Scanner Data for CPI
Annunziata Fiore, Antonella Simone, Antonino Virgillito

Last modified: 2017-04-29

Abstract


In this paper we present the issues and challenges related to dealing with datasets of big size such as those involved in the Scanner Data project at Istat. The motivations behind the design of the IT architecture backing the project are explained as well as the solutions introduced as part of a larger scope approach to the modernisation of tools and techniques used for data storage and processing in Istat, envisioning the future challenges posed by Big Data and Data Science in NSIs. We show how the IT architecture, applied to the testing phase of the project currently in place, can help the methodological choices for the construction of consumer prices microindices. In particular, we present the results of an analysis, carried out over the entire data set, targeted at comparing different approaches (yearly updated fixed basket vs. chained base price) by estimating the impact of missing values and replacements in both cases. Finally, an in-depth discussion is provided about the benefits and the trade-offs resulting from the use of Big Data technology for statistical production.