Simultaneous unsupervised and supervised classification modeling for clustering, model selection and dimensionality reduction

Mario Fordellone; Maurizio Vichi

Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Mario Fordellone, Maurizio Vichi

Last modified: 2018-06-06

Abstract

In the unsupervised classification field, the choice of the number of clusters and the lack of assessment and interpretability of the final partition by means of inferential tools denotes an important limitation that could negatively influence the reliability of the final results. In this work, we propose to combine unsupervised classification with supervised methods in order to enhance the assessment and interpretation of the obtained partition, to identify the correct number of clusters and to select the variables that better contribute to define the groups structure in the data. An application on real data is presented in order to better clarify the utility of the proposed approach.

References

Agresti, A., Kateri, M. Categorical data analysis. In International encyclopedia of statistical science, pp. 206-208 (2011)
Aloise, D., Deshpande, A., Hansen, P., Popat, P. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2), pp. 245-248 (2009)
Calinski,T.,&Harabasz,J.(1974).Adendritemethodforclusteranalysis.Communications in Statistics-theory and Methods, 3(1), pp. 1-27.
Dietterich,T.G.(1998).Approximatestatisticaltestsforcomparingsupervisedclassification learning algorithms. Neural computation, 10(7), pp. 1895-1923.
Filipovych,R.,Resnick,S.M.,&Davatzikos,C.(2011).Semi-supervisedclusteranalysisof imaging data. NeuroImage, 54(3), pp. 2185-2197.
Hepner, G., Logan, T., Ritter, N., & Bryant, N. (1990). Artificial neural network classifi- cation using a minimal training set- Comparison to conventional supervised classification. Photogrammetric Engineering and Remote Sensing, 56(4), pp. 469-473.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observa- tions. In Proceedings of the fifth Berkeley symposium on mathematical statistics and proba- bility, 1(14) pp. 281-297.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), pp. 846-850.
Steinbach,M.,Karypis,G.,&Kumar,V.(2000,August).Acomparisonofdocumentcluster- ing techniques. In KDD workshop on text mining, 400(1), pp. 525-526.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Method- ology), 63(2), pp. 411-423.

Full Text: PDF