Open Conference Systems, CLADAG2023

Font Size: 
A proposal for the joint automated detection of clusters and anomalies
Luis Angel García-Escudero, Christian Hennig, Agustín Mayo-Iscar, Gianluca Morelli, Marco Riani

Last modified: 2023-05-10


It is known that outliers can be problematic when statistical techniques are applied. This is also the case in Cluster Analysis and, with this in mind, the TCLUST method was introduced as a robust clustering alternative. Given a fixed trimming level alpha, TCLUST attempts to detect the fraction $\alpha$ of observations that should best be discarded after assuming k normally distributed components. However, the main problem is how to determine reasonable values for k and alpha for a given data set. An approach was introduced to choose k and alpha through visual inspection of "classification trimmed likelihood" curves. Theoretical background will be provided for a better understanding of that approach, along with a parametric bootstrap method to reduce subjectivity and produce a small list of sensible robust clustering partitions.