Open Conference Systems, CLADAG2023

Font Size: 
LATENT BAYESIAN CLUSTERING FOR TOPIC MODELLING
Lorenzo Schiavon

Last modified: 2023-06-06

Abstract


The main objective in topic modelling is uncovering the underlyingthemes present in a corpus of text data. This process is generally constituted by twophases: (i) identifying the main words associated with each topic; (ii) grouping documentsthat contain similar sets of words together. In this work, we exploit recentadvances in Bayesian factor models to represent the high-dimensional space of theobserved words through a set of low-dimensional latent variables, and to jointly clusterthe documents according to their distribution over such latent constructs. Groupsand underlying constructs are interpreted as document topics and language concepts,respectively, with the number of such dimensions that is not required in advance. Weapply the proposed approach to a data set of newspaper headlines.