Open Conference Systems, CLADAG2023

Font Size: 
Quantifying variable importance in cluster analysis
Christian Hennig, Keefe Murphy

Last modified: 2023-07-07


We propose to measure the importance of variables when running a cluster analysis by measuring the similarity of a clustering using all variables with a clustering applying the same method leaving out one variable. If the resulting clustering
is very similar, the left out variable does not have much impact. An alternative is
to replace the variable by randomly permuted values. Beyond variable selection (on
which we will not focus), variable importance measurement is useful for interpreting
and understanding a clustering. Also we will use variable importance measurement
to discuss whether clustering methods appropriately balance the impact of different
variables in mixed type variables clustering.