# zbMATH — the first resource for mathematics

Least angle regression. (With discussion). (English) Zbl 1091.62054
Summary: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived:
(1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principle choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

##### MSC:
 62J05 Linear regression; mixed models 62J07 Ridge regression; shrinkage estimators (Lasso) 65C60 Computational problems in statistics (MSC2010)
##### Keywords:
Lasso; boosting; coefficient paths; variable selection
##### Software:
AdaBoost.MH; alr3; ElemStatLearn; gbm; UCI-ml
Full Text:
##### References:
  Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042  Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461–470. · Zbl 0621.62073 · doi:10.2307/2289236  Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: The $$.632+$$ bootstrap method. J. Amer. Statist. Assoc. 92 548–560. · Zbl 0887.62044 · doi:10.2307/2965703  Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504  Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451  Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223 · euclid:aos/1016218223  Golub, G. and Van Loan, C. (1983). Matrix Computations . Johns Hopkins Univ. Press, Baltimore, MD. · Zbl 0559.65011  Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning : Data Mining , Inference and Prediction . Springer, New York. · Zbl 0973.62007  Lawson, C. and Hanson, R. (1974). Solving Least Squares Problems . Prentice-Hall, Englewood Cliffs, NJ. · Zbl 0860.65028  Mallows, C. (1973). Some comments on $$C_p$$. Technometrics 15 661–675. · Zbl 0269.62061 · doi:10.2307/1267380  Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083–1104. · Zbl 1105.62340 · doi:10.1214/aos/1015956708  Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389  Osborne, M. R., Presnell, B. and Turlach, B. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337.  Rao, C. R. (1973). Linear Statistical Inference and Its Applications , 2nd ed. Wiley, New York. · Zbl 0256.62002  Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151. JSTOR: · Zbl 0476.62035 · doi:10.1214/aos/1176345632 · links.jstor.org  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B. 58 267–288. · Zbl 0850.62538  Weisberg, S. (1980). Applied Linear Regression . Wiley, New York. · Zbl 0529.62054  Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120–131. · Zbl 0920.62056 · doi:10.2307/2669609  Breiman, L. (1992). The little bootstrap and other methods for dimensionality selection in regression: $$X$$-fixed prediction error. J. Amer. Statist. Assoc. 87 738–754. · Zbl 0850.62518 · doi:10.2307/2290212  George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.  Ishwaran, H. and Rao, J. S. (2000). Bayesian nonparametric MCMC for large variable selection problems. Unpublished manuscript.  Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Statist. Assoc. 98 438–455. · Zbl 1041.62090 · doi:10.1198/016214503000224  Mallows, C. (1973). Some comments on $$C_p$$. Technometrics 15 661–675. · Zbl 0269.62061 · doi:10.2307/1267380  Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression (with discussion). J. Amer. Statist. Assoc. 83 1023–1036. · Zbl 0673.62051 · doi:10.2307/2290129  Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486–494. · Zbl 0773.62051 · doi:10.2307/2290328  Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140. · Zbl 0858.68080  Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927–961. · Zbl 1029.62037 · doi:10.1214/aos/1031689014 · euclid:aos/1031689014  Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2000). Adapting to unknown sparsity by controlling the false discovery rate. Technical Report 2000–19, Dept. Statistics, Stanford Univ. · Zbl 1092.62005  Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60 255–265. · Zbl 0318.62075 · doi:10.1093/biomet/60.2.255  Birgé, L. and Massart, P. (2001a). Gaussian model selection. J. Eur. Math. Soc. 3 203–268. · Zbl 1037.62001 · doi:10.1007/s100970100031  Birgé, L. and Massart, P. (2001b). A generalized $$C_p$$ criterion for Gaussian model selection. Technical Report 647, Univ. Paris 6 & 7. · Zbl 1037.62001  Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975. JSTOR: · Zbl 0829.62066 · doi:10.1214/aos/1176325766 · links.jstor.org  Knight, K. and Fu, B. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397  Loubes, J.-M. and van de Geer, S. (2002). Adaptive estimation with soft thresholding penalties. Statist. Neerlandica 56 453–478. · Zbl 1090.62534 · doi:10.1111/1467-9574.00212  Mallows, C. (1973). Some comments on $$C_p$$. Technometrics 15 661–675. · Zbl 0269.62061 · doi:10.2307/1267380  van de Geer, S. (2001). Least squares estimation with complexity penalties. Math. Methods Statist. 10 355–374. · Zbl 1005.62043  Breiman, L. (2001). Random forests. Available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/ randomforest2001.pdf. · Zbl 1007.68152  Fu, W. J. (1998). Penalized regressions: The Bridge versus the Lasso. J. Comput. Graph. Statist. 7 397–416.  Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389  Ridgeway, G. (2003). GBM 0.7-2 package manual. Available at http://cran.r-project.org/doc/ packages/gbm.pdf.  Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.  Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504  Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451  Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12 512–518. MIT Press, Cambridge, MA.  Rosset, S. and Zhu, J. (2004). Piecewise linear regularized solution paths. Advances in Neural Information Processing Systems 16 . · Zbl 1194.62094 · doi:10.1214/009053606000001370  Rosset, S., Zhu, J. and Hastie, T. (2003). Boosting as a regularized path to a maximum margin classifier. Technical report, Dept. Statistics, Stanford Univ. · Zbl 1222.68290  Zhu, J., Rosset, S., Hastie, T. and Tibshirani, R. (2004). 1-norm support vector machines. Neural Information Processing Systems 16 . · Zbl 1222.68213  Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300. · Zbl 0809.62014  Blake, C. and Merz, C. (1998). UCI repository of machine learning databases. Technical report, School Information and Computer Science, Univ. California, Irvine. Available at www.ics.uci.edu/ mlearn/MLRepository.html.  Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455. · Zbl 0815.62019 · doi:10.1093/biomet/81.3.425  Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975. JSTOR: · Zbl 0829.62066 · doi:10.1214/aos/1176325766 · links.jstor.org  Foster, D. P. and Stine, R. A. (1996). Variable selection via information theory. Technical Report Discussion Paper 1180, Center for Mathematical Studies in Economics and Management Science, Northwestern Univ.  Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486–494. · Zbl 0773.62051 · doi:10.2307/2290328  Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384. · Zbl 0862.62059 · doi:10.2307/1269730  McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0588.62104  Moore, D. S. and McCabe, G. P. (1999). Introduction to the Practice of Statistics , 3rd ed. Freeman, New York. · Zbl 0701.62002  Nelder, J. A. (1977). A reformulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48–76.  Nelder, J. A. (1994). The statistics of linear models: Back to basics. Statist. Comput. 4 221–234.  Cook, R. D. (1998). Regression Graphics . Wiley, New York. · Zbl 0903.62001  Cook, R. D. and Weisberg, S. (1999a). Applied Regression Including Computing and Graphics. Wiley, New York. · Zbl 0928.62045  Cook, R. D. and Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? Amer. Statist. 53 29–37.  Efron, B. (2001). Discussion of “Statistical modeling: The two cultures,” by L. Breiman. Statist. Sci. 16 218–219. · Zbl 1059.62505 · doi:10.1214/ss/1009213726  Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342. · Zbl 0742.62044 · doi:10.2307/2290563  Weisberg, S. (1981). A statistic for allocating $$C_p$$ to individual cases. Technometrics 23 27–31.  Weisberg, S. (2002). Dimension reduction regression in R. J. Statistical Software 7 . (On-line journal available at www.jstatsoft.org. The software is available from cran.r-project.org.)  Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2000). Adapting to unknown sparsity by controlling the false discovery rate. Technical Report 2000-19, Dept. Statistics, Stanford Univ. · Zbl 1092.62005  Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268. · Zbl 1037.62001 · doi:10.1007/s100970100031  Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. · Zbl 1117.62324 · doi:10.1198/016214504000000692 · masetto.asa.catchword.org  Foster, D. and Stine, R. (1997). An information theoretic comparison of model selection criteria. Technical report, Dept. Statistics, Univ. Pennsylvania.  George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747. · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731  Leblanc, M. and Tibshirani, R. (1998). Monotone shrinkage of trees. J. Comput. Graph. Statist. 7 417–433.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.