Font Size:
Issues in joint dimension reduction and clustering methods
Last modified: 2018-05-17
Abstract
Joint data reduction (JDR) methods consist of a combination of well established unsupervised techniques such asdimension reduction and clustering. Distance-based clustering of high dimensional data sets can be problematic because of the well-known curse of dimensionality.To tackle this issue, practitioners use principal methods first, in order to reduce dimensionality of the data, and then apply a clustering procedure on the obtained scores.JDR methods have proven to outperform such sequential (tandem) approach, both in case of continuous and of categorical data sets.Over time, several JDR methods followed by extensions, generalizations and modifications havebeen proposed, appraised both theoretically and empirically by researchers. Some aspects, however, are still worth further work, such as the presence of $i)$ mixed continuous and categorical attributes; $ii)$ outliers undermining the identification of the clustering structure. In this paper, we propose a JDR method for mixed data: the method in question is built upon existing continuous-only and categorical-only JDR methods. Also, we appraise the sensitivity of the proposed method to the presence of outliers.
Full Text:
PDF