Open Conference Systems, 50th Scientific meeting of the Italian Statistical Society

Font Size: 
On the choice of an appropriate bandwidth for modal clustering
Alessandro Casa, Josè Chacón, Giovanna Menardi

Last modified: 2018-05-17

Abstract


In modal clustering framework groups are regarded as the domains of attraction of the modes of probability density function underlying the data. Operationally, to obtain a partition, a nonparametric density estimate is required and kernel density estimator is commonly considered. When resorting to these methods a relevant issue regards the selection of the smoothing parameter governing the shape of the density and hence possibly the modal structure. In this work we propose a criterion to choose the bandwidth, specifically tailored for the clustering problem since based on the minimization of the distance between a partition of the data induced by the kernel estimator and the whole-space partition induced by the true density.

References


1. Chacón, J.E.: A population background for nonparametric density-based clustering. Stat Sci,30(4): 518-532 (2015).

2. McNicholas, P.D.: Model-based clustering. J Classif, 33(3): 331-373 (2016).

3. Menardi, G.: A review on modal clustering. Int Stat Rev, 84(3): 413-433 (2016).

4. Romano, J.P.: On weak convergence and optimality of kernel density estimates of the mode. Ann Stat, 16(2):629-647 (1988).

5. Samworth, R.J & Wand, M.P.: Asymptotics and optimal bandwidth selection for highest density region estimation. Ann Stat, 38(3): 1767-1792 (2010).

6. Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif, 20(1): 25-47 (2003).

7. Wand, M.P. & Jones, M.C.: Kernel smoothing. Chapman & Hall (1994)


Full Text: PDF