Boosting algorithms: regularization, prediction and model fitting. (English) Zbl 1246.62163

Summary: We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.


62J12 Generalized linear models (logistic models)
62N02 Estimation in survival analysis and censored data
65C60 Computational problems in statistics (MSC2010)
62-04 Software, source code, etc. for problems pertaining to statistics
Full Text: DOI arXiv Euclid


[1] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545-1588.
[2] Audrino, F. and Barone-Adesi, G. (2005). Functional gradient descent for financial time series with an application to the measurement of market risk. J. Banking and Finance 29 959-977. · Zbl 1133.91441
[3] Audrino, F. and Barone-Adesi, G. (2005). A multivariate FGD technique to improve VaR computation in equity markets. Comput. Management Sci. 2 87-106. · Zbl 1133.91441
[4] Audrino, F. and Bühlmann, P. (2003). Volatility estimation with functional gradient descent for very high-dimensional financial time series. J. Comput. Finance 6 65-89.
[5] Bartlett, P. (2003). Prediction algorithms: Complexity, concentration and convexity. In Proceedings of the 13th IFAC Symp. on System Identification .
[6] Bartlett, P. L., Jordan, M. and McAuliffe, J. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138-156. · Zbl 1118.62330
[7] Bartlett, P. and Traskin, M. (2007). AdaBoost is consistent. J. Mach. Learn. Res. 8 2347-2368. · Zbl 1222.68142
[8] Benner, A. (2002). Application of “aggregated classifiers” in survival time studies. In Proceedings in Computational Statistics ( COMPSTAT ) (W. Härdle and B. Rönz, eds.) 171-176. Physica-Verlag, Heidelberg.
[9] Binder, H. (2006). GAMBoost : Generalized additive models by likelihood based boosting. R package version 0.9-3. Available at http://CRAN.R-project.org.
[10] Bissantz, N., Hohage, T., Munk, A. and Ruymgaart, F. (2007). Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal. 45 2610-2636. · Zbl 1234.62062
[11] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/ mlearn/MLRepository.html.
[12] Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. J. Machine Learning Research 4 861-894. · Zbl 1083.68109
[13] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373-384. JSTOR: · Zbl 0862.62059
[14] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123-140. · Zbl 0858.68080
[15] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801-849. · Zbl 0934.62064
[16] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493-1517.
[17] Breiman, L. (2001). Random forests. Machine Learning 45 5-32. · Zbl 1007.68152
[18] Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559-583. · Zbl 1095.62077
[19] Bühlmann, P. (2007). Twin boosting: Improved feature selection and prediction. Technical report, ETH Zürich. Available at ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/TwinBoosting1.pdf.
[20] Bühlmann, P. and Lutz, R. (2006). Boosting algorithms: With an application to bootstrapping multivariate time series. In The Frontiers in Statistics (J. Fan and H. Koul, eds.) 209-230. Imperial College Press, London. · Zbl 1119.62049
[21] Bühlmann, P. and Yu, B. (2000). Discussion on “Additive logistic regression: A statistical view,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377-386.
[22] Bühlmann, P. and Yu, B. (2003). Boosting with the L 2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324-339. · Zbl 1041.62029
[23] Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Machine Learning Research 7 1001-1024. · Zbl 1222.68155
[24] Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation: Structure and applications. Technical report, Univ. Washington. Available at http://www.stat.washington.edu/wxs/Learning-papers/paper-proper-scoring.pdf.
[25] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583-3593.
[26] Dettling, M. and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19 1061-1069.
[27] DiMarzio, M. and Taylor, C. (2008). On boosting kernel regression. J. Statist. Plann. Inference. · Zbl 1182.62091
[28] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407-499. · Zbl 1091.62054
[29] Freund, Y. and Schapire, R. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory . Springer, Berlin. · Zbl 0880.68103
[30] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning . Morgan Kaufmann, San Francisco, CA.
[31] Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103
[32] Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189-1232. · Zbl 1043.62034
[33] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337-407. · Zbl 1106.62323
[34] Garcia, A. L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H. J. and Trippo, U. (2005). Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obesity Research 13 626-634.
[35] Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, M., Iacus, S., Irizarry, R., Leisch, F., Li, C., Mächler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. and Zhang, J. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5 R80.
[36] Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models : A Roughness Penalty Approach . Chapman and Hall, New York. · Zbl 0832.62032
[37] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971-988. · Zbl 1055.62078
[38] Hansen, M. and Yu, B. (2001). Model selection and minimum description length principle. J. Amer. Statist. Assoc. 96 746-774. JSTOR: · Zbl 1017.62004
[39] Hastie, T. and Efron, B. (2004). Lars: Least angle regression, lasso and forward stagewise. R package version 0.9-7. Available at http://CRAN.R-project.org.
[40] Hastie, T. and Tibshirani, R. (1986). Generalized additive models (with discussion). Statist. Sci. 1 297-318. · Zbl 0645.62068
[41] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models . Chapman and Hall, London. · Zbl 0747.62061
[42] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning ; Data Mining , Inference and Prediction . Springer, New York. · Zbl 0973.62007
[43] Hothorn, T. and Bühlmann, P. (2007). Mboost : Model-based boosting. R package version 0.5-8. Available at http://CRAN.R-project.org/.
[44] Hothorn, T. and Bühlmann, P. (2006). Model-based boosting in high dimensions. Bioinformatics 22 2828-2829.
[45] Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. (2006). Survival ensembles. Biostatistics 7 355-373. · Zbl 1170.62385
[46] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Party: A laboratory for recursive part(y)itioning. R package version 0.9-11. Available at http://CRAN.R-project.org/.
[47] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651-674.
[48] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression. Statist. Sinica . · Zbl 1255.62198
[49] Hurvich, C., Simonoff, J. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. Roy. Statist. Soc. Ser. B 60 271-293. JSTOR: · Zbl 0909.62039
[50] Iyer, R., Lewis, D., Schapire, R., Singer, Y. and Singhal, A. (2000). Boosting for document routing. In Proceedings of CIKM-00 , 9th ACM Int. Conf. on Information and Knowledge Management (A. Agah, J. Callan and E. Rundensteiner, eds.). ACM Press, New York.
[51] Jiang, W. (2004). Process consistency for AdaBoost (with discussion). Ann. Statist. 32 13-29, 85-134. · Zbl 1105.62316
[52] Kearns, M. and Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. J. Assoc. Comput. Machinery 41 67-95. · Zbl 0807.68073
[53] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1-50. · Zbl 1012.62004
[54] Leitenstorfer, F. and Tutz, G. (2006). Smoothing with curvature constraints based on boosting techniques. In Proceedings in Computational Statistics ( COMPSTAT ) (A. Rizzi and M. Vichi, eds.). Physica-Verlag, Heidelberg. · Zbl 1162.62337
[55] Leitenstorfer, F. and Tutz, G. (2007). Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics 8 654-673. · Zbl 1118.62125
[56] Leitenstorfer, F. and Tutz, G. (2007). Knot selection by boosting techniques. Comput. Statist. Data Anal. 51 4605-4621. · Zbl 1162.62337
[57] Lozano, A., Kulkarni, S. and Schapire, R. (2006). Convergence and consistency of regularized boosting algorithms with stationary \beta -mixing observations. In Advances in Neural Information Processing Systems (Y. Weiss, B. Schölkopf and J. Platt, eds.) 18 . MIT Press.
[58] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30-55, 85-134. · Zbl 1105.62319
[59] Lutz, R. and Bühlmann, P. (2006). Boosting for high-multivariate responses in high-dimensional linear regression. Statist. Sinica 16 471-494. · Zbl 1096.62057
[60] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 41 3397-3415. · Zbl 0842.94004
[61] Mannor, S., Meir, R. and Zhang, T. (2003). Greedy algorithms for classification-consistency, convergence rates, and adaptivity. J. Machine Learning Research 4 713-741. · Zbl 1105.68388
[62] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 221-246. MIT Press, Cambridge.
[63] McCaffrey, D. F., Ridgeway, G. and Morral, A. R. G. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods 9 403-425.
[64] Mease, D., Wyner, A. and Buja, A. (2007). Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439. · Zbl 1222.68261
[65] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[66] Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (S. Mendelson and A. Smola, eds.). Springer, Berlin. · Zbl 1019.68092
[67] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036
[68] Park, M.-Y. and Hastie, T. (2007). An L1 regularization-path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659-677.
[69] R Development Core Team (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
[70] Rätsch, G., Onoda, T. and Müller, K. (2001). Soft margins for AdaBoost. Machine Learning 42 287-320. · Zbl 0969.68128
[71] Ridgeway, G. (1999). The state of boosting. Comput. Sci. Statistics 31 172-181.
[72] Ridgeway, G. (2000). Discussion on “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie, R. Tibshirani. Ann. Statist. 28 393-400. · Zbl 1106.62323
[73] Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Comput. Statist. Data Anal. 38 379-392. · Zbl 1072.62560
[74] Ridgeway, G. (2006). Gbm : Generalized boosted regression models. R package version 1.5-7. Available at http://www.i-pensieri.com/gregr/gbm.shtml.
[75] Schapire, R. (1990). The strength of weak learnability. Machine Learning 5 197-227.
[76] Schapire, R. (2002). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149-171. Springer, New York. · Zbl 1142.62372
[77] Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1686. · Zbl 0929.62069
[78] Schapire, R. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning 39 135-168. · Zbl 0951.68561
[79] Southwell, R. (1946). Relaxation Methods in Theoretical Physics . Oxford, at the Clarendon Press. · Zbl 0074.10805
[80] Street, W. N., Mangasarian, O. L., and Wolberg, W. H. (1995). An inductive learning approach to prognostic prediction. In Proceedings of the Twelfth International Conference on Machine Learning . Morgan Kaufmann, San Francisco, CA.
[81] Temlyakov, V. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213-227. · Zbl 0964.65009
[82] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[83] Tukey, J. (1977). Exploratory Data Analysis . Addison-Wesley, Reading, MA. · Zbl 0409.62003
[84] Tutz, G. and Binder, H. (2006). Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62 961-971. · Zbl 1116.62075
[85] Tutz, G. and Binder, H. (2007). Boosting Ridge regression. Comput. Statist. Data Anal. 51 6044-6059. · Zbl 1330.62294
[86] Tutz, G. and Hechenbichler, K. (2005). Aggregating classifiers with ordinal response structure. J. Statist. Comput. Simul. 75 391-408. · Zbl 1061.62092
[87] Tutz, G. and Leitenstorfer, F. (2007). Generalized smooth monotonic regression in additive modelling. J. Comput. Graph. Statist. 16 165-188. · Zbl 1118.62125
[88] Tutz, G. and Reithinger, F. (2007). Flexible semiparametric mixed models. Statistics in Medicine 26 2872-2900.
[89] van der Laan, M. and Robins, J. (2003). Unified Methods for Censored Longitudinal Data and Causality . Springer, New York. · Zbl 1013.62034
[90] West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98 11462-11467.
[91] Yao, Y., Rosasco, L. and Caponnetto, A. (2007). On early stopping in gradient descent learning. Constr. Approx. 26 289-315. · Zbl 1125.62035
[92] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538-1579. · Zbl 1078.62038
[93] Zhao, P. and Yu, B. (2007). Stagewise Lasso. J. Mach. Learn. Res. 8 2701-2726. · Zbl 1222.68345
[94] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541-2563. · Zbl 1222.62008
[95] Zhu, J., Rosset, S., Zou, H. and Hastie, T. (2005). Multiclass AdaBoost. Technical report, Stanford Univ. Available at http://www-stat.stanford.edu/ hastie/Papers/samme.pdf. · Zbl 1245.62080
[96] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.