Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
Introduction to Symbolic Data Analysis and application to post clustering for comparing and improving clustering methods by the Symbolic Data Table that they induce
Edwin Diday

Last modified: 2017-05-24

Abstract


First we recall that Symbolic Data Ana lysis (SDA) is a way of thinking by classes in Data Science. We recall that classes of standard units are in SDA the new statistical units of higher level than the initial standard statistical units. In SDA classes are considered as objects to be described in all their facets by “symbolic data” taking care on their internal variability by staying close of the user language. Then we focus on different strategies of building a Symbolic data table from a standard data table by using: Partitioning (k-means, dynamic clustering), Fuzzy clustering (by EM, others), mixture decomposition of Copulas (by a “copula-EM” or a “copula-dynamic clustering”). Few words will be said also on how building classes at the second level (where the units are classes), by using Dirichlet models. Then, we give tools in order to measure the quality of the obtained symbolic data tables. By this way we can compare the different associated clustering methods and improve them.