zbMATH — the first resource for mathematics

Boosting for high-dimensional linear models. (English) Zbl 1095.62077
Summary: We prove that boosting with the squared error loss, \(L_2\)-Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as \(O\)(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the \(\ell_1\)-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the \(\ell_1\)-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes \(L_2\)-Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate \(L_2\)-Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

62J05 Linear regression; mixed models
62B10 Statistical aspects of information-theoretic topics
65C60 Computational problems in statistics (MSC2010)
49M15 Newton-type methods
62P10 Applications of statistics to biology and medical sciences; meta analysis
68Q32 Computational learning theory
62J99 Linear inference, regression
Full Text: DOI arXiv
[1] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[2] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
[3] Bühlmann, P. and Yu, B. (2003). Boosting with the \(l_2\) loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[4] Bühlmann, P. and Yu, B. (2005). Sparse boosting. J. Machine Learning Research . · Zbl 1222.68155
[5] Chen, S., Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[6] CRAN (1997 ff.). The comprehensive R archive network. Available at cran.r-project.org.
[7] Dettling, M. and Bühlmann, P. (2004). Finding predictive gene groups from microarray data. J. Multivariate Anal. 90 106–131. · Zbl 1047.62103 · doi:10.1016/j.jmva.2004.02.012
[8] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150
[9] Dudoit, S., Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97 77–87. JSTOR: · Zbl 1073.62576 · doi:10.1198/016214502753479248 · links.jstor.org
[10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[11] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning : Proc. Thirteenth International Conference 148–156. Morgan Kaufmann, San Francisco.
[12] Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[13] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223 · euclid:aos/1016218223
[14] Goldenshluger, A. and Tsybakov, A. (2001). Adaptive prediction and estimation in linear regression with infinitely many parameters. Ann. Statist. 29 1601–1619. · Zbl 1043.62076 · doi:10.1214/aos/1015345956
[15] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[16] Hurvich, C., Simonoff, J. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 271–293. JSTOR: · Zbl 0909.62039 · doi:10.1111/1467-9868.00125 · links.jstor.org
[17] Jiang, W. (2004). Process consistency for AdaBoost (with discussion). Ann. Statist. 32 13–29, 85–134. · Zbl 1105.62316 · doi:10.1214/aos/1079120128 · euclid:aos/1079120128
[18] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55, 85–134. · Zbl 1105.62319 · doi:10.1214/aos/1079120129 · euclid:aos/1079120129
[19] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time–frequency dictionaries. IEEE Trans. Signal Proc. 41 3397–3415. · Zbl 0842.94004 · doi:10.1109/78.258082
[20] Schapire, R. (2002). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification . Lecture Notes in Statist. 171 149–171. Springer, New York. · Zbl 1142.62372
[21] Temlyakov, V. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213–227. · Zbl 0964.65009 · doi:10.1023/A:1018917218956
[22] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[23] Tukey, J. (1977). Exploratory Data Analysis . Addison–Wesley, Reading, MA. · Zbl 0409.62003
[24] West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U.S.A. 98 11,462–11,467.
[25] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538–1579. · Zbl 1078.62038 · doi:10.1214/009053605000000255
[26] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.