zbMATH — the first resource for mathematics

Variable selection in semiparametric regression modeling. (English) Zbl 1132.62027
Summary: We are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and selection of significant variables for the parametric portion. Thus, semiparametric variable selection is much more challenging than parametric variable selection (e.g., linear and generalized linear models) because traditional variable selection procedures including stepwise regression and the best subset selection now require separate model selection for the nonparametric components for each submodel. This leads to a very heavy computational burden.
We propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. We establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we show the asymptotic normality of the resulting estimate and further demonstrate that the proposed procedures perform as well as an oracle procedure. A semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate that its limiting null distribution follows a chi-square distribution which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.

62G08 Nonparametric regression and quantile regression
62G10 Nonparametric hypothesis testing
62G20 Asymptotic properties of nonparametric inference
62J05 Linear regression; mixed models
65C05 Monte Carlo methods
62H12 Estimation in multivariate analysis
Full Text: DOI Euclid arXiv
[1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716-723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations. J. Amer. Statist. Assoc. 96 939-967. JSTOR: · Zbl 1072.62561 · doi:10.1198/016214501753208942 · links.jstor.org
[3] Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[4] Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95 888-902. JSTOR: · Zbl 0999.62052 · doi:10.2307/2669472 · links.jstor.org
[5] Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477-489. JSTOR: · Zbl 0890.62053 · doi:10.2307/2965697 · links.jstor.org
[6] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377-403. · Zbl 0377.65007 · doi:10.1007/BF01404567 · eudml:132586
[7] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications . Chapman and Hall, New York. · Zbl 0873.62037
[8] Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11 1031-1057. · Zbl 1098.62077 · doi:10.3150/bj/1137421639 · euclid:bj/1137421639
[9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 · links.jstor.org
[10] Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 153-193. · Zbl 1029.62042 · doi:10.1214/aos/996986505
[11] Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947-1975. · Zbl 0829.62066 · doi:10.1214/aos/1176325766
[12] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109-148. · Zbl 0775.62288 · doi:10.2307/1269656
[13] Härdle, W., Liang, H. and Gao, J. T. (2000). Partially Linear Models . Springer, Heidelberg. · Zbl 0968.62006
[14] Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models (with discussion). J. Roy. Statist. Soc. Ser. B 55 757-796. JSTOR: · Zbl 0796.62060 · links.jstor.org
[15] Hunsberger, S. (1994). Semiparametric regression in likelihood-based models. J. Amer. Statist. Assoc. 89 1354-1365. JSTOR: · Zbl 0812.62044 · doi:10.2307/2290997 · links.jstor.org
[16] Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617-1642. · Zbl 1078.62028 · doi:10.1214/009053605000000200
[17] Li, R. and Liang, H. (2005). Variable selection in semiparametric regression modeling. Available at http://www.stat.psu.edu/ rli/research/varyselTR.pdf. · Zbl 1132.62027
[18] Mack, Y. P. and Silverman, B. W. (1982). Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete 61 405-415. · Zbl 0495.62046 · doi:10.1007/BF00539840
[19] Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econ. Theory 7 186-199. JSTOR: · links.jstor.org
[20] Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257-1270. JSTOR: · Zbl 0868.62034 · doi:10.2307/2291516 · links.jstor.org
[21] Ruppert, D., Wand, M. and Carroll, R. (2003). Semiparametric Regression . Cambridge Univ. Press. · Zbl 1038.62042 · doi:10.1017/CBO9780511755453
[22] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[23] Severini, T. A. and Staniswalis, J. G. (1994). Quasilikelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89 501-511. JSTOR: · Zbl 0798.62046 · doi:10.2307/2290852 · links.jstor.org
[24] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[25] Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika 91 661-681. · Zbl 1108.62019 · doi:10.1093/biomet/91.3.661
[26] Yatchew, A. (2003). Semiparametric Regression for the Applied Econometrician . Cambridge Univ. Press. · Zbl 1067.62041 · doi:10.1017/CBO9780511615887
[27] Zhang, W., Lee, S. Y. and Song, X. Y. (2002). Local polynomial fitting in semivarying coefficient model. J. Multivariate Anal. 82 166-188. · Zbl 0995.62038 · doi:10.1006/jmva.2001.2012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.