×

Regularization and model selection with categorical predictors and effect modifiers in generalized linear models. (English) Zbl 07257900

Summary: Varying-coefficient models with categorical effect modifiers are considered within the framework of generalized linear models. We distinguish between nominal and ordinal effect modifiers, and propose adequate Lasso-type regularization techniques that allow for (1) selection of relevant covariates, and (2) identification of coefficient functions that are actually varying with the level of a potentially effect modifying factor. For computation, a penalized iteratively reweighted least squares algorithm is presented. We investigate large sample properties of the penalized estimates; in simulation studies, we show that the proposed approaches perform very well for finite samples, too. In addition, the presented methods are compared with alternative procedures, and applied to real-world data.

MSC:

62-XX Statistics

Software:

R; gvcm.cat; CasANOVA
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bondell, HD, Reich, BJ (2009) Simultaneous factor selection and collapsing levels in ANOVA. Biometrics, 65, 169-77. · Zbl 1159.62048 · doi:10.1111/j.1541-0420.2008.01061.x
[2] Efron, B, Hastie, T, Johnstone, I, Tibshirani, R (2004) Least angle regression. The Annals of Statistics, 32, 407-99. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[3] Fahrmeir, L, Kaufmann, H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized liner models. The Annals of Statistics, 13, 342-68. · Zbl 0594.62058 · doi:10.1214/aos/1176346597
[4] Fan, J, Li, R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-60. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[5] Fan, J, Zhang, W (1999) Statistical estimation in varying coefficient models. The Annals of Statistics, 27, 1491-518. · Zbl 0977.62039 · doi:10.1214/aos/1017939139
[6] Gertheiss, J, Tutz, G (2010) Sparse modelling of categorial explanatory variables. Annals of Applied Statistics, 4, 2150-80. · Zbl 1220.62092 · doi:10.1214/10-AOAS355
[7] Gertheiss, J, Tutz, G (2012) Regularization and model selection with categorial effect modifiers. Statistica Sinica, 22, 957-82. · Zbl 1257.62078
[8] Hastie, T, Tibshirani, R (1993) Varying-coefficient models. Journal of the Royal Statistical Society. Series B, Statistical methodology, 55, 757-96. · Zbl 0796.62060
[9] Hoerl, AE, Kennard, RW (1970) Ridge regression: biased estimation for non-orthogonal problems. Technometrics, 12, 55-67. · Zbl 0202.17205 · doi:10.1080/00401706.1970.10488634
[10] Hofner, B, Hothorn, T, Kneib, T (2012) Variable selection and model choice in structured survival models. Computational Statistics, 28, 1079-101. · Zbl 1305.65043 · doi:10.1007/s00180-012-0337-x
[11] Hoover, DR, Rice, JA, Wu, CO, Yang, L-P (1998) Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika, 85, 809-22. · Zbl 0921.62045 · doi:10.1093/biomet/85.4.809
[12] Kauermann, G, Tutz, G (2000) Local likelihood estimation in varying-coefficient models including additive bias correction. Journal of Nonparametric Statistics, 12, 343-71. · Zbl 0945.62044 · doi:10.1080/10485250008832812
[13] Koch, I (1996) On the asymptotic performance of median smoothers in image analysis and nonparametric regression. The Annals of Statistics, 24, 1648-66. · Zbl 0867.62031 · doi:10.1214/aos/1032298289
[14] Leng, C (2009) A simple approach for varying-coefficient model selection. Journal of Statistical Planning and Inference, 139, 2138-46. · Zbl 1160.62067 · doi:10.1016/j.jspi.2008.10.009
[15] Lin, Y, Zhang, HH (2006) Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34, 2272-97. · Zbl 1106.62041 · doi:10.1214/009053606000000722
[16] Lu, Y, Zhang, R, Zhu, L (2008) Penalized spline estimation for varying-coefficient models. Communications in Statistics - Theory and Methods, 37, 2249-61. · Zbl 1143.62023 · doi:10.1080/03610920801931887
[17] McCullagh, P (1983) Quasilikelihood functions. The Annals of Statistics, 11, 59-67. · Zbl 0507.62025 · doi:10.1214/aos/1176346056
[18] Meier-Dinkel, L, Trautmann, J, Frieden, L, Tholen, E, Knorr, C, Sharifi, AR, Bücking, M, Wicke, M, Mörlein, D (2013) Consumer perception of boar meat as affected by labelling information, malodorous compounds and sensitivity to androstenone. Meat Science, 93, 248-56. · doi:10.1016/j.meatsci.2012.09.002
[19] Mörlein, D, Grave, A, Sharifi, AR, Bücking, M, Wicke, M (2012) Different scalding techniques do not affect boar taint. Meat Science, 91, 435-40. · doi:10.1016/j.meatsci.2012.02.028
[20] Oelker, M-R (2013) gvcm.cat: regularized categorial effects/categorial effect modifiers in GLMs. R package version 1.5.
[21] Oelker, M-R, Tutz, G (2013) A general family of penalties for combining differing types of penalties in generalized structured models. Department of Statistics: Technical Report 139, http://epub.ub.uni-muenchen.de/14735/.
[22] R Development Core Team (2012) R: a language and environment for statistical computing. Vienna, Austria, ISBN 3-900051-07-0.
[23] Tibshirani, R (1996) Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B, Statistical methodology, 58, 267-88. · Zbl 0850.62538
[24] Tibshirani, R, Saunders, M, Rosset, S, Zhu, J, Knight, K (2005) Sparsity and smoothness via the fused LASSO. Journal of the Royal Statistical Society. Series B, Statistical methodology, 67, 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[25] Ulbricht, J (2010) Variable selection in generalized linear models. Dissertation, Department of Statistics, Ludwig-Maximilians-Universität München: Verlag Dr. Hut. · Zbl 1277.62027
[26] Wang, H, Xia, Y (2009) Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association, 104, 747-57. · Zbl 1388.62213 · doi:10.1198/jasa.2009.0138
[27] Wang, L, Li, H, Huang, JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103, 1556-69. · Zbl 1286.62034 · doi:10.1198/016214508000000788
[28] Wu, CO, Hoover, DR (1998) Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. Journal of the American Statistical Association, 93, 1388-89. · Zbl 1064.62523 · doi:10.1080/01621459.1998.10473800
[29] Yuan, M, Lin, Y (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B, Statistical methodology, 68, 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[30] Zou, H (2006) The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418-29. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.