×

Nonparametric estimation of the link function including variable selection. (English) Zbl 1322.62003

Summary: Nonparametric methods for the estimation of the link function in generalized linear models are able to avoid bias in the regression parameters. But for the estimation of the link typically the full model, which includes all predictors, has been used. When the number of predictors is large these methods fail since the full model cannot be estimated. In the present article a boosting type method is proposed that simultaneously selects predictors and estimates the link function. The method performs quite well in simulations and real data examples.

MSC:

62-04 Software, source code, etc. for problems pertaining to statistics
62-07 Data analysis (statistics) (MSC2010)
62G05 Nonparametric estimation
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Antoniadis, A., Gregoire, G., McKeague, I.W.: Bayesian estimation in single-index models. Stat. Sin. 14, 1147–1164 (2004) · Zbl 1060.62031
[2] Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat. Sci. 22, 477–505 (2007) · Zbl 1246.62163 · doi:10.1214/07-STS242
[3] Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003) · Zbl 1041.62029 · doi:10.1198/016214503000125
[4] Carroll, R.J., Fan, J., Gijbels, I., Wand, M.P.: Generalized partially linear single-index models. J. Am. Stat. Assoc. 92, 477–489 (1997) · Zbl 0890.62053 · doi:10.1080/01621459.1997.10474001
[5] Cui, X., Härdle, W.K., Zhu, L.: Generalized single index models: the EFM approach. Discussion Paper 50, SFB 649, Humboldt University Berlin, Economic Risk (2009)
[6] Czado, Y., Santner, T.: The effect of link misspecification on binary regression inference. J. Stat. Plan. Inference 33, 213–231 (1992) · Zbl 0781.62037 · doi:10.1016/0378-3758(92)90069-5
[7] Dep, P., Trivedi, P.K.: Demand for medical care by the elderly: a finite mixture approach. J. Appl. Econom. 12, 313–336 (1997) · doi:10.1002/(SICI)1099-1255(199705)12:3<313::AID-JAE440>3.0.CO;2-G
[8] Dierckx, P.: Curve and Surface Fitting with Splines. Oxford Science Publications, Oxford (1993) · Zbl 0782.41016
[9] Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Math. Stat. 32, 407–499 (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[10] Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and Penalties. Stat. Sci. 11, 89–121 (1996) · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[11] Fan, J., Li, R.: Variable selection via nonconcave penalize likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001) · Zbl 1073.62547 · doi:10.1198/016214501753382273
[12] Friedman, J.H., Stützle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981) · doi:10.1080/01621459.1981.10477729
[13] Gaiffas, S., Lecue, G.: Optimal rates and adaptations in the single-index model using aggregation. Electron. J. Stat. 1, 538–573 (2007) · Zbl 1320.62091 · doi:10.1214/07-EJS077
[14] Gertheiss, J., Hogger, S., Oberhauser, C., Tutz, G.: Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. J. R. Stat. Soc. Ser. C (2011). doi: 10.1111/j.1467-9876.2010.00753.x
[15] Härdle, W., Hall, P., Ichimura, H.: Optimal smoothing in single-index models (1993) · Zbl 0770.62049
[16] Hastie, T.: Comment: boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4) (2007) · Zbl 1246.62167
[17] Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: mboost: model-based boosting. R package version 2.0–0. (2009)
[18] Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res. 11, 2109–2113 (2010) · Zbl 1242.68002
[19] Hristache, M., Juditsky, A., Spokoiny, V.: Direct estimation of the index coefficient in a single-index model. Ann. Stat. 29, 595–623 (2001) · Zbl 1012.62043 · doi:10.1214/aos/1009210681
[20] James, G.M., Radchenko, P.: A generalized dantzig selector with shrinkage tuning. Biometrika 127–142 (2008) · Zbl 1163.62054
[21] Klein, R.L., Spady, R.H.: An efficient semiparametric estimator for binary response models. Econometrica 61, 387–421 (1993) · Zbl 0783.62100 · doi:10.2307/2951556
[22] Leitenstorfer, F., Tutz, G.: Estimation of single-index models based on boosting techniques. Stat. Model. 11, 183–197 (2011) · Zbl 05933700 · doi:10.1177/1471082X1001100302
[23] Lokhorst, J., Venables, B., Turlach, B., Maechler, M.: lasso2: L1 constrained estimation aka ’lasso’. R package version 1.2-6 (2007)
[24] Maron, M.: Threshold effect of eucalypt density on an aggressive avian competitor. Biol. Conserv. 136, 100–107 (2007) · doi:10.1016/j.biocon.2006.11.007
[25] Muggeo, V.M.R., Ferrara, G.: Fitting generalized linear models with unspecified link function: a p-spline approach. Comput. Stat. Data Anal. 52(5) (2008) · Zbl 1452.62541
[26] Naik, P.A., Tsai, C.-L.: Single-index model selection. Biom. Trust 88, 821–832 (2001) · Zbl 0988.62042 · doi:10.1093/biomet/88.3.821
[27] Park, M.Y., Hastie, T.: An l1 regularization-path algorithm for generalized linear models. Preprint, Department of Statistics, Stanford University (2006)
[28] Powell, J.L., Stock, J.H., Stoker, T.M.: Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430 (1989) · Zbl 0683.62070 · doi:10.2307/1913713
[29] Ramsey, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005)
[30] Ramsay, J.O., Wickham, H., Graves, S., Hooker, G.: fda: functional data analysis R package version 2.2.5 (2010)
[31] Ruckstuhl, A., Welsh, A.: Reference bands for nonparametrically estimated link functions. J. Comput. Graph. Stat. 8(4), 699–714 (1999)
[32] Stoker, T.M.: Consistent estimation of scaled coefficients. Econometrica 54, 1461–1481 (1986) · Zbl 0628.62105 · doi:10.2307/1914309
[33] Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996) · Zbl 0850.62538
[34] Turlach, B.A.: quadprog: functions to solve quadratic programming problems. R package version 1.4-11, S original by Berwin A. Turlach, R port by Andreas Weingessel (2009)
[35] Tutz, G., Binder, H.: Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62, 961–971 (2006) · Zbl 1116.62075 · doi:10.1111/j.1541-0420.2006.00578.x
[36] Weisberg, S., Welsh, A.H.: Adapting for the missing link. Ann. Stat. 22, 1674–1700 (1994) · Zbl 0828.62059 · doi:10.1214/aos/1176325749
[37] Yu, Y., Ruppert, D.: Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 97, 1042–1054 (2002) · Zbl 1045.62035 · doi:10.1198/016214502388618861
[38] Zeileis, A.: Object-oriented computation of sandwich estimator. J. Stat. Softw. 16(9) (2006) · Zbl 1445.62316
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.