Last modified: 20180606
Abstract
In the unsupervised classification field, the choice of the number of clusters and the lack of assessment and interpretability of the final partition by means of inferential tools denotes an important limitation that could negatively influence the reliability of the final results. In this work, we propose to combine unsupervised classification with supervised methods in order to enhance the assessment and interpretation of the obtained partition, to identify the correct number of clusters and to select the variables that better contribute to define the groups structure in the data. An application on real data is presented in order to better clarify the utility of the proposed approach.
References

Agresti, A., Kateri, M. Categorical data analysis. In International encyclopedia of statistical science, pp. 206208 (2011)

Aloise, D., Deshpande, A., Hansen, P., Popat, P. NPhardness of Euclidean sumofsquares clustering. Machine learning, 75(2), pp. 245248 (2009)

Calinski,T.,&Harabasz,J.(1974).Adendritemethodforclusteranalysis.Communications in Statisticstheory and Methods, 3(1), pp. 127.

Dietterich,T.G.(1998).Approximatestatisticaltestsforcomparingsupervisedclassification learning algorithms. Neural computation, 10(7), pp. 18951923.

Filipovych,R.,Resnick,S.M.,&Davatzikos,C.(2011).Semisupervisedclusteranalysisof imaging data. NeuroImage, 54(3), pp. 21852197.

Hepner, G., Logan, T., Ritter, N., & Bryant, N. (1990). Artificial neural network classifi cation using a minimal training set Comparison to conventional supervised classification. Photogrammetric Engineering and Remote Sensing, 56(4), pp. 469473.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observa tions. In Proceedings of the fifth Berkeley symposium on mathematical statistics and proba bility, 1(14) pp. 281297.

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), pp. 846850.

Steinbach,M.,Karypis,G.,&Kumar,V.(2000,August).Acomparisonofdocumentcluster ing techniques. In KDD workshop on text mining, 400(1), pp. 525526.

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Method ology), 63(2), pp. 411423.