Open Conference Systems, CLADAG2023

Font Size: 
Multivariate Regression Tree Topic Modeling
marco ortu, Giulia Contu, Luca Frigau

Last modified: 2023-06-27

Abstract


In this paper we propose Multivariate Tree Topic Modeling methodology, a general purpose approach to Topic Modeling, which aims to refine the general results of a Topic Modeling methodology using Multivariate Trees in order to obtain consistent documents groups. Topic modeling is defined as a mechanism for discovering low-dimensional, multi-faceted summaries of textual documents, typically by discovering hidden or latent topics in a corpus of documents. Given these hidden topics, we exploit the Multivariate Trees to obtain more homogeneous documents groups with respect to the Topic Modeling output alone. We applied our model to two standard corpora of documents generally used in this kind of studies to show that, when the aim of Topic Modeling is to generate coherent clusters of documents, the use of Multivariate Trees improves the overall coherence of these clusters for a wide range of Multivariate Trees’ size.