Structured Approaches for High- Dimensional Predictive Modeling

Marco Seabra dos Reis

Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Marco Seabra dos Reis

Last modified: 2017-06-14

Abstract

Current predictive analytics approaches are strongly focused on optimizing accuracy metrics, leaving little room to incorporate a priori knowledge about the processes under analysis and relegating to a secondary concern the interpretation of results (Hastie, Tibshirani, & Friedman, 2001; Reis & Saraiva, 2005; Rendall, Pereira, & Reis, 2017). However, in the analysis of complex systems, one of the main interests is precisely the induction of relevant associations, in order to understand or clarify the way systems operate. On the other hand, there is often information available regarding the structure of the processes, which could be used in benefit of the analysis and to enhance the interpretation of results. The importance of this issue is not new and has motivated the development of multiblock approaches that try to improve the interpretation of results, while maintaining the quality of predictions (Naes, Tomic, Afseth, Segtnan, & MÃ¥ge, 2013; Tenenhaus & Tenenhaus, 2014; Trygg & Wold, 1998; Westerhuis, Kourti, & MacGregor, 1998).

In this paper, two classes of multiblock frameworks are addressed, that present interpretational-oriented features, while allowing some system structure to be incorporated. One class is based on the existence of a priori knowledge for building the blocks of variables, while the other is able to extract the system structure in a data-driven way (Reis, 2013a, 2013b). The introduction of such block structures in the predictive platforms constraint their predictive spaces, for the sake of enabling interpretable elements in the final model. These constraints do not usually compromise the methodsâ€™ performance when compared to their unconstrained counterparts, and sometimes even led to improvements in prediction ability, due to the use of more parsimonious and robust models.