Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
Clustering of histogram data : a topological learning approach
Guénaël Cabanes, Younès Bennani, Rosanna Verde, Antonio Irpino

Last modified: 2017-05-09

Abstract


An histogram data is described by a set of distributions. In this paper, we propose a clustering approach using an adaptation of the Self-Organizing Map (SOM) algorithm. The idea is to combine the dimension reduction obtained with a SOM and the clustering of the data in this reduced space. The L2 Wasserstein distance is used to measure dissimilarity between distributions and to estimate local data densities in the original space. The main advantage of the proposed algorithm is that the number of clusters is found automatically. Applications on synthetic and real data-sets demonstrate the validity of the proposed approach.