Font Size:

Flexible clustering methods for high-dimensional data sets

Last modified: 2018-05-17

#### Abstract

Finite mixture models assume that a population is a convex combination of densities; therefore, they are well suited for clustering applications. Each cluster is modeled using a density function. One of the most flexible distributions is the generalized hyperbolic distribution (GHD). It can handle skewness and heavy tails, and has many well-known distributions as special or limiting cases. Â The multiple scaled GHD (MSGHD) and the mixture of coalesced GHDs (CGHD) are even more flexible methods that can detect non-elliptical, and even non-convex, clusters. The drawback of high flexibility is a high parametrization --- especially so for high-dimensional data because the number of parameters is depends on the number of variables. Therefore, the aforementioned methods are not well suited for high-dimensional data clustering. However, the eigen-decomposition of the component scale matrix can naturally be used for dimension reduction obtaining a transformation of the MSGHD and MCGHD that is better suited for high-dimensional data clustering.

#### References

Â

- 1. Barndorff-Nielsen, O., Halgreen, C.: Infinite divisibility of the hyperbolic and generalized in- verse Gaussian distributions. Z. Wahrscheinlichkeitstheorie Verw. GebieteÂ 38, 309â€“311 (1977)
- 2. Barndorff-Nielsen, O., Kent, J., SÃ¸rensen, M.: Normal variance-mean mixtures and z distri- butions. International Statistical Review / Revue Internationale de StatistiqueÂ 50(2), 145â€“159 (1982)
- Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics 43(2), 176â€“198 (2015)
- Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39(1), 1â€“38 (1977)
- Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweights: Application to robust clustering. Statistics and Computing 24(6), 971â€“984 (2014)
- Gneiting, T.: Normal scale mixtures and dual probability densities. Journal of Statistical Computation and Simulation 59(4), 375â€“384 (1997)
- Tortora, C., Franczak, B., Browne, R., McNicholas, P.: A mixture of coalesced generalized hyperbolic distributions. Journal of Classification (accepted) (2018)
- Tortora, C., McNicholas, P.D., Browne, R.P.: A mixture of generalized hyperbolic factor analyzers. Advances in Data Analysis and Classification 10(4), 423â€“440 (2016)

Full Text:
PDF