Automatic variable and components weighting systems for Fuzzy cmeans of distributional data

Antonio Irpino; Francisco de A. T. De Carvalho; Rosanna Verde

Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Antonio Irpino, Francisco de A. T. De Carvalho, Rosanna Verde

Last modified: 2017-05-20

Abstract

A distributional variable describes an object by a 1-D probability or frequency density function. While in standard clustering algorithms all the variables contribute to the clusters definition with the same importance, subspace clustering aims at finding a subspace, as a linear combination of the original variables, where clusters are well represented. This is done by weighting variables automatically and accordingly to their capacity of being discriminant for the clusters. Considering a decomposition of the squared $L_2$ Wasserstein distance for distributional data, and using the notion of adaptive distance, we extend a fuzzy subspace clustering for automatically computing relevance weights associated with variables as well as with their components. This is done for the whole dataset or cluster-wisely. An application shows the advantages of using such algorithms.