Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Font Size: 
A robust clustering procedure with unknown number of clusters
Francesco Dotto

Last modified: 2018-05-23

Abstract


A new methodology for robust clustering without specifying in advancethe underlying number of Gaussian clusters is proposed. The procedure is based oniteratively trimming, assessing the goodness of fit, and reweighting. The forwardversion of our procedure is initialized with a high trimming level and K = 1 populations. The procedure is then iterated throughout a fixed sequence of decreasingtrimming levels. New observations are added at each step and, whenever a goodnessof fit rule is not satisfied, the number of components K is increased. A stoppingrule prevents our procedure from using outlying observations. Additional use of abackward criterion is discussed.

References


1. Atkinson, A.C., Riani, M., Cerioli, A.: Cluster detection and clustering with random startforward searches. Journal of Applied Statistics pp. 1–22 (2017)

2. Cerioli, A., Riani, M., Atkinson, A.C., Corbellini, A.: The power of monitoring: how to makethe most of a contaminated multivariate sample. Statistical Methods & Applications pp. 1–29(2018)

3. Dotto, F., Farcomeni, A., Garcıa-Escudero, L.A., Mayo-Iscar, A.: A fuzzy approach to robustregression clustering. Advances in Data Analysis and Classification 11(4), 691–710 (2017)

4. Dotto, F., Farcomeni, A., Garcia-Escudero, L.A., Mayo-Iscar, A.: A reweighting approach torobust clustering. Statistics and Computing 28(2), 477–493 (2018)

5. Farcomeni, A.: Robust constrained clustering in presence of entry-wise outliers. Technometrics56, 102–111 (2014)

6. Farcomeni, A., Dotto, F.: The power of (extended) monitoring in robust clustering. StatisticalMethods & Applications pp. 1–10

7.Farcomeni, A., Greco, L.: Robust methods for data reduction. CRC press (2016)

8. Flury, B., Riedwyl, H.: Multivariate Statistics. A Practical Approach. Chapman and Hall,London (1988)

9. Fritz, H., Garc´ıa-Escudero, L., Mayo-Iscar, A.: tclust: An R package for a trimming approachto cluster analysis. J Stat Softw 47 (2012). URL http://www.jstatsoft.org/v47/i12

10. Garcıa-Escudero, L., Gordaliza, A., Matr´an, C., Mayo-Iscar, A.: A general trimming approachto robust cluster analysis. Ann Stat 36, 1324–1345 (2008)

11. Garc´ıa-Escudero, L., Gordaliza, A., Matr´an, C., Mayo-Iscar, A.: Avoiding spurious local maximizersin mixture modeling. Stat Comput 25, 619–633 (2015)

12. Hennig, C., Liao, T.F.: How to find an appropriate clustering for mixed-type variables withapplication to socio-economic stratification. Journal of the Royal Statistical Society: Series C(Applied Statistics) 62(3), 309–369 (2013)


Full Text: PDF