Fuzzy Ensemble Machine Learning algorithm to improve prediction

Nicolò Biasetton; Riccardo Ceccato; Marta Disegna; Alberto Molena

Open Conference Systems, CLADAG2023

Nicolò Biasetton, Riccardo Ceccato, Marta Disegna, Alberto Molena

Last modified: 2023-06-14

Abstract

Predictive data analytics refers to building models to make predictions based on patterns extracted from historical data do have a precise idea of what might happen in the future. In such context, Supervised Machine Learning algorithms are widely used to build prediction models. In this study, we suggest to use a sequential combination of Unsupervised and Supervised Machine Learning algorithms to improve the accuracy of the final prediction model. The idea of combining clustering analysis and regression models has been firstly developed at the beginning of the 20th century and since than academics grown their interest in such approach thanks to the demonstrated increased accuracy of the prediction. After a comprehensive literature review on ensemble Machine Learning used for prediction purposes, the suggested method will be theoretically and empirically presented. The idea is to combine the results of the Fuzzy C-Medoids clustering algorithm with a Machine Learning pre[1]diction model. We suggest to use a fuzzy clustering to group observation for two reasons: 1) to identify fuzzy units that are discarded from further analysis; 2) to use the membership degrees as a weight in the prediction model. Regarding the supervised regression model adopted in the second step, ten different algorithms have been compared in order to select the model with the higher performance. The suggested fuzzy ensemble Machine Learning model has been finally compared with the best supervised regression algorithm without clustering and other clustering-regression model using the same testing data. Findings reveal that the suggested method outperform both previous clustering-regression model and the traditional supervised regression model. In the conclusion, some suggestions and recommendations for future analysis will be discussed.