On the choice of an appropriate bandwidth for modal clustering
Alessandro Casa, Josè Chacón, Giovanna Menardi
Last modified: 2018-05-17
Abstract
In modal clustering framework groups are regarded as the domains of attraction of the modes of probability density function underlying the data. Operationally, to obtain a partition, a nonparametric density estimate is required and kernel density estimator is commonly considered. When resorting to these methods a relevant issue regards the selection of the smoothing parameter governing the shape of the density and hence possibly the modal structure. In this work we propose a criterion to choose the bandwidth, specifically tailored for the clustering problem since based on the minimization of the distance between a partition of the data induced by the kernel estimator and the whole-space partition induced by the true density.
References
1. Chacón, J.E.: A population background for nonparametric density-based clustering. Stat Sci,30(4): 518-532 (2015).
2. McNicholas, P.D.: Model-based clustering. J Classif, 33(3): 331-373 (2016).
3. Menardi, G.: A review on modal clustering. Int Stat Rev, 84(3): 413-433 (2016).
4. Romano, J.P.: On weak convergence and optimality of kernel density estimates of the mode. Ann Stat, 16(2):629-647 (1988).
5. Samworth, R.J & Wand, M.P.: Asymptotics and optimal bandwidth selection for highest density region estimation. Ann Stat, 38(3): 1767-1792 (2010).
6. Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif, 20(1): 25-47 (2003).
7. Wand, M.P. & Jones, M.C.: Kernel smoothing. Chapman & Hall (1994)