Random-Based Initialization for Clustering mixed-type data with the k-Prototypes algorithm

Rabea Aschenbruck; Gero Szepannek; Adalbert Franz Xaver Wilhelm

Open Conference Systems, CLADAG2023

Rabea Aschenbruck, Gero Szepannek, Adalbert Franz Xaver Wilhelm

Last modified: 2023-06-29

Abstract

One of the most popular partitioning cluster algorithms for mixed-type data is the k-prototypes algorithm. Due to its iterative structure, the algorithm may only converge to a local optimum rather than a global one. Therefore, the resulting cluster partition may suffer from the initialization. In general, there are two ways of achieving an improvement of the initialization: One possibility is to determine con- crete initial cluster prototypes, and the other strategy is to repeat the algorithm with different randomly chosen initial objects. Different numbers of algorithm repetitions are analyzed and evaluated comparatively. It is shown that an improvement of the clus- ter algorithm’s target criterion can be achieved by an appropriate choice of repetitions, even with manageable time expenditure.