Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Font Size: 
Dirichlet processes, posterior similarity and graph clustering
Stefano Tonellato

Last modified: 2018-05-18


This paper  proposes a   clustering method based on the sequential estimation of the  random partition induced by the Dirichlet process.   Our approach relies on the Sequential  Importance Resampling (SIR) algorithm and on the estimation of the  posterior probabilities that each pair of individuals are  generated by the same mixture component. Such estimates do not  require the identification of mixture components, and  therefore are not affected by label switching. Then, a similarity  matrix can be easily built, allowing for the construction of a  weighted undirected graph. A random walk can be defined on  such a graph, whose dynamics is closely linked to the  posterior similarity. A community detection algorithm, the map equation, can then be implemented in order to  achieve a clustering minimising an information theoretic  criterion.

Full Text: PDF