Reconstructing missing data sequences in multivariate time series: an application to environmental data
Maria Lucia Parrella, Giuseppina Albano, Michele La Rocca, Cira Perna

Last modified: 2018-05-18


Missing data arise in many statistical analyses and can have a significant effect on the conclusions that can be drawn from the data. In environmental data a common approach usually adopted by the Environmental Protection Agencies to handle missing values is by deleting those observations with incomplete information from the study, obtaining a massive underestimation of a lot of indexes usually used for evaluating air quality. In multivariate time series it may happen that not only isolated values but also long sequences of some of the time series' components may miss. We propose an new procedure that aims to reconstruct the missing sequences by exploiting the spatial correlation and the serial correlation of the multivariate time series. The proposed procedure is based on a spatial dynamic model and imputes the missing values in the time series basing on a linear combination of the neighbor contemporary observations and their lagged values. It is oriented to spatio temporal data, although it is general to be applied to generic stationary multivariate time series. The procedure has been applied to the pollution data with a remarkably satisfactory performance.


