Weighted vs unweighted distance based decision trees for ranking data

Antonella Plaia; Simona Buscemi; Mariangela Sciandra

Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Antonella Plaia, Simona Buscemi, Mariangela Sciandra

Last modified: 2018-05-17

Abstract

Preference data represent a particular type of ranking data (widely used insports, web search, social sciences), where a group of people gives their preferences over a set of alternatives. Within this framework, distance-based decision trees represent a non-parametric tool for identifying the profiles of subjects giving a similar ranking.

This paper aims to detect, in the framework of (complete and incomplete) ranking data, the impact of the differently stuctured weighted distances for building decision trees. The traditional metrics between rankings donâ€™t take into account the importance of swapping elements similar among them (element weights) or elements belonging to the top (or to the bottom) of an ordering (position weights). By means of simulations, using weighted distances both for generating rankings and to build decision trees, we will compute the impact of different weighting systems both on splitting and on consensus ranking. The distances that will be used satisfy Kemenyâ€™s axioms and, accordingly, the rank correlation coefficient taux, proposed by Edmond and Mason, will be used both for pruning the trees and for assessing their â€œgoodnessâ€.

Full Text: PDF