Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Font Size: 
Unsupervised clustering of Italian schools via non-parametric multilevel models
Chiara Masci, Francesca Ieva, Anna Maria Paganoni

Last modified: 2018-04-26


This work proposes an EM algorithm for the estimation of non-parametric mixed-effects models (NPEM algorithm) and shows its application to the National Institute for the Educational Evaluation of Instruction and Training (INVALSI) dataset of 2013/2014, as a tool for unsupervised clustering of Italian schools. Among the main novelties, the NPEM algorithm, when applied to hierarchical data, it allows the covariates to be group specific and it assumes the random effects to be distributed according to a discrete distribution with an (a priori) unknown number of support points. In doing so, it induces an automatic clustering of the grouping factor at higher level of hierarchy. In the application to INVALSI data, the NPEM algorithm enables the identification of latent groups of schools that differ in their effects on student achievements.


1. Agasisti, T., Vittadini, G.: Regional economic disparities as determinants of student’s achievementin Italy. Research in Applied Economics, 4(2), 33 (2012).

2. Agasisti, T., Ieva, F., Paganoni, A.M.: Heterogeneity, school-effects and the North/South achievement gap in Italian secondary education: evidence from a three-level mixed model. Statistical Methods & Applications, 26(1), 157-180 (2017).

3. Aitkin M.: A general maximum likelihood analysis of overdispersion in generalized linearmodels. Statistics and computing, 6(3), 251-262 (1996).

4. Aitkin M.: A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55(1), 117-128 (1999).

5. Azzimonti, L., Ieva, F., Paganoni, A.M.: Nonlinear nonparametric mixed-effects models for unsupervised classification. Computational Statistics, 28(4), 1549-1570 (2013).

6. Fox, J.: Linear mixed models. Appendix to An R and S-PLUS Companion to Applied Regression(2002).

7. INVALSI website

8. Lindsay, B.: The geometry of mixture likelihoods: a general theory. The Annals of Statistics,11(1), 86-94 (1983).

9. Lindsay, B.: The geometry of mixture likelihoods, part II: the exponential family. The Annalsof Statistics, 11(3), 783-792 (1983).

10. Masci, C., Ieva, F., Agasisti, T., Paganoni, A.M.: Does class matter more than school? Evidencefrom a multilevel statistical analysis on Italian junior secondary school students. Socio-Economic Planning Sciences, 54,47-57 (2016).

11. Masci, C., Ieva, F., Agasisti, T., Paganoni, A.M.: Bivariate multilevel models for the analysisof mathematics and reading pupils’ achievements. Journal of Applied Statistics, 44(7), 1296-1317 (2017).

12. Masci, C., Ieva, Paganoni, A.M.: Non-parametric mixed-effects models for unsupervised classificationof Italian schools. MOX-report 63/2017.

13. Sulis, I., Porcu, M.: Assessing divergences in mathematics and reading achievement in italianprimary schools: A pro posal of adjusted indicators of school effectiveness. Social IndicatorsResearch, 122(2), 607-634 (2015).

Full Text: PDF