Open Conference Systems, CLADAG2023

Font Size: 
A proposal for the joint automated detection of clusters and anomalies
Luis Angel García-Escudero, Christian Hennig, Agustín Mayo-Iscar, Gianluca Morelli, Marco Riani

Last modified: 2023-05-10

Abstract


It is known that outliers can be problematic when statistical techniques are applied. This is also the case in Cluster Analysis and, with this in mind, the TCLUST method was introduced as a robust clustering alternative. Given a fixed trimming level alpha, TCLUST attempts to detect the fraction $\alpha$ of observations that should best be discarded after assuming k normally distributed components. However, the main problem is how to determine reasonable values for k and alpha for a given data set. An approach was introduced to choose k and alpha through visual inspection of "classification trimmed likelihood" curves. Theoretical background will be provided for a better understanding of that approach, along with a parametric bootstrap method to reduce subjectivity and produce a small list of sensible robust clustering partitions.