Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
Statistical categorization through archetypal analysis
Francesco Palumbo, Giancarlo Ragozini

Last modified: 2017-05-15

Abstract


Human knowledge develops through complex relationships between categories. In the era of the Big Data, categorization implies data summarization in a limited number  of well-separated groups that must be maximally internally homogeneous at the same time.  This proposal exploits archetypal analysis capabilities in finding a set of extreme points that  can summarize the entire data set in homogeneous groups. Archetypes are then used to identify  the best prototypes according to the Rosch's definition. Finally, in the geometric approach to cognitive science,  the Voronoi tessellation based on the prototypes is used to define a categorization. An example on the Forina's et al. well-known wine data set illustrates the procedure.