Dirichlet processes, posterior similarity and graph clustering

Stefano Tonellato

Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Stefano Tonellato

Last modified: 2018-05-18

Abstract

This paperÂ proposes aÂ Â clustering method based on the sequential estimation of theÂ random partition induced by the Dirichlet process.Â Â Our approach relies on the SequentialÂ Importance Resampling (SIR) algorithm and on the estimation of theÂ posterior probabilities that each pair of individuals areÂ generated by the same mixture component. Such estimates do notÂ require the identification of mixture components, andÂ therefore are not affected by label switching. Then, a similarityÂ matrix can be easily built, allowing for the construction of aÂ weighted undirected graph. A random walk can be defined onÂ such a graph, whose dynamics is closely linked to theÂ posterior similarity. A community detection algorithm, the map equation, can then be implemented in order toÂ achieve a clustering minimising an information theoreticÂ criterion.

Full Text: PDF