Font Size:
On the estimation of high-dimensional regression models with binary covariates
Last modified: 2018-05-18
Abstract
In this paper we address the problem of estimating the parameters of high dimensional regression models characterized by binary covariates. We suggest a new procedure which combines particular clustering for the binary covariates and group penalized regression for estimating the model parameters. The good performance of the methodology is shown in a simulation study.
References
- Breheny, P., Huang, J. (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2, 369–380.
- Breheny, P. (2015). The Group Exponential Lasso for Bi-Level Variable Selection. Biometrics, 71, 731–740.
- Everitt, B., Landau, S., Leese, M., Stahl, D. (2011). Cluster analysis. 5th edn, Wiley, Chichester.
- Galimberti, G., Montanari, A., Viroli, C. (2009). Penalized factor mixture analysis for variable selection in clustered data, Computational statistics & data analysis, 53, 4301–4310.
- Huang, Z. (1998). Extensions to the v-means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2, 28-304.
- Huang, J., Breheny, P., Ma, S. (2012). A Selective Review of Group Selection in High-Dimensional Models. Statistical Sciences, 27, 481–499.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.
- Santra, T. (2016). A Bayesian non-parametric method for clustering high-dimensional binary data. https://arxiv.org/pdf/1603.02494.
- Schwarz, G. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6,461–464.
- Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68, 49–67.
- Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942.
Full Text:
PDF