Data-driven Transformations for the Estimation of Small Area Means

Nora Ulrike WÃ¼rz; Timo Schmid; Nikos Tzavidis

Open Conference Systems, ITACOSM 2019 - Survey and Data Science

Nora Ulrike WÃ¼rz, Timo Schmid, Nikos Tzavidis

Building: Learning Center Morgagni
Room: Aula 210
Date: 2019-06-06 09:00 AM – 10:30 AM
Last modified: 2019-05-06

Abstract

For many surveys, the problem of small sample sizes within (certain) subpopulations arises. Small area estimation is a powerful tool to overcome this problem. As small area models rely on linear mixed models, for example, the Gaussian assumption of the error terms must hold. In real applications for many variables, like income, this assumption is often not satisfied. Therefore, this work focuses on tackling the potential lack of validity of the model assumptions by using transformations for the dependent variable in the context of restricted data access.

When the register covariates are available on unit-level ad-hoc chosen and data-driven transformations for adjusting the underlying data have been used in literature. However, in many applications like in Germany, the register covariates are only available on the aggregated level. Therefore, we propose small area methods with (data-driven) transformations in situations when aggregated register information is only available. We have to consider a bias-correction a) due to the back-transformation of the dependent variable and b) due to the aggregated register information (covariates). In most real situations we cannot assume that the underlying distribution for the covariates is known. Therefore, we estimate the distribution of the covariates from the available sample. Out of this distribution the required total of the transformed covariates can be calculated and out of this the small area mean can be determined using a bias-correction.

Extensive model-based simulations are used for comparing the presented methodology to alternative unit-level methodologies for estimating small area means. Finally, the need for such methods is illustrated in a design-based simulation study using real census data from Mexico.

Full Text: SLIDES