# zbMATH — the first resource for mathematics

A significance test for the lasso. (English) Zbl 1305.62254
A linear regression model is considered, $y=X\beta^*+\varepsilon,\quad \varepsilon\sim N(0, \sigma^2I),$ where $$y\in \mathbb{R}^n$$ is an outcome vector, $$X$$ is a design matrix, and $$\beta^*\in \mathbb{R}^p$$ are unknown coefficients to be estimated. The lasso estimator $$\hat {\beta} =\hat {\beta} (\lambda)$$ minimizes the objective function $Q(\beta; \lambda)=\frac{1}{2} \|y-X\beta\|_2^2+\lambda \|\beta\|_1,\quad \beta\in \mathbb{R}^p,$ where $$\lambda \geq 0$$ is a tuning parameter, controlling the level of sparsity in $$\hat {\beta}$$. It is assumed that the columns of $$X$$ are in general position in order to ensure uniqueness of the lasso solution, see [R. J. Tibshirani, Electron. J. Stat. 7, 1456–1490 (2013; Zbl 1337.62173)].
The path $$\hat {\beta} (\lambda)$$ is a piecewise linear function, with knots at values $$\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r \geq 0$$. At $$\lambda=\infty$$, the solution $$\hat {\beta}(\infty)$$ has no active variables, and for decreasing $$\lambda$$, each knot $$\lambda_k$$ marks the entry or removal of some variable from the current active set. At any $$\lambda \geq 0$$, the corresponding active set $$A=\operatorname{supp}(\hat {\beta}(\lambda))$$ indexes a linearly independent set of predictor variables, that is, $$\operatorname{rank}(X_A)=|A|$$, where $$X_A$$ denotes the columns of $$X$$ in $$A$$.
Let $$A$$ be the active set just before $$\lambda_k$$, and suppose that predictor $$j$$ enters at $$\lambda_k$$. Denote by $$\hat {\beta}(\lambda_{k+1})$$ the solution at point $$\lambda=\lambda_{k+1}$$, using predictors $$A$$ and $$j$$. Let $$\tilde{\beta}_A (\lambda_{k+1})$$ be the lasso solution using only the active predictors $$X_A$$, at $$\lambda=\lambda_{k+1}$$.
In the paper under review, the covariance test statistic is proposed, $T_k=\frac{1}{\sigma^2}(y, X\hat {\beta} (\lambda_{k+1})-X_A\tilde{\beta}_A (\lambda_{k+1})).$ The main result given in Theorem 3 states the following: under the null hypothesis that current lasso model contains all truly active variables, $$\operatorname{supp}(\beta^*) \subseteq A$$, $$T_k$$ is asymptotically distributed as a standard exponential random variable, given reasonable assumption on $$X$$ and the magnitudes of the nonzero true coefficients. This statistic can be used to test the significance of an additional variable between two nested models, when this additional variable is not fixed and has been chosen adaptively.
In Section 6, this result is modified for the case of unknown $$\sigma^2$$. Section 8 discusses some extensions to the elastic net, generalized linear models, and the Cox proportional hazards model; the proposals there are supported by simulations, but no theory is offered.

##### MSC:
 62J07 Ridge regression; shrinkage estimators (Lasso) 62F03 Parametric hypothesis testing 62J05 Linear regression; mixed models 62J12 Generalized linear models (logistic models)
##### Keywords:
lasso; least angle regression; $$p$$-value; significance test
##### Software:
covTest; ElemStatLearn; glmnet; NESTA; PDCO; TFOCS
Full Text:
##### References:
  Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542  Becker, S., Bobin, J. and Candès, E. J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1-39. · Zbl 1209.90265 · doi:10.1137/090756855  Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165-218. · Zbl 1257.90042 · doi:10.1007/s12532-011-0029-5  Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternative direction method of multipliers. Faund. Trends Mach. Learn. 3 1-122. · Zbl 1229.90122 · doi:10.1561/2200000016  Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212-1242. · Zbl 1273.62173 · doi:10.3150/12-BEJSP11  Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $$\ell_1$$ minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053 · doi:10.1214/08-AOS653  Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406-5425. · Zbl 1309.94033 · doi:10.1109/TIT.2006.885507  Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010  de Haan, L. and Ferreira, A. (2006). Extreme Value Theory : An Introduction . Springer, New York. · Zbl 1101.62002  Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582  Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461-470. · Zbl 0621.62073 · doi:10.2307/2289236  Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067  Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh-dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37-65. · doi:10.1111/j.1467-9868.2011.01005.x  Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22.  Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131  Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601-3608. · Zbl 1286.94031 · doi:10.1109/TIT.2005.855614  Grazier G’Sell, M., Taylor, J. and Tibshirani, R. (2013). Adaptive testing for the graphical lasso. Preprint. Available at . · arxiv.org  Grazier G’Sell, M., Wager, S., Chouldechova, A. and Tibshirani, R. (2013). False discovery rate control for sequential selection procedures, with application to the lasso. Preprint. Available at . · arxiv.org  Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846  Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning ; Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005  Javanmard, A. and Montanari, A. (2013a). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at . · Zbl 1319.62145 · arxiv.org  Javanmard, A. and Montanari, A. (2013b). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. Preprint. Available at . · Zbl 1360.62074 · arxiv.org  Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x  Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $$p$$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671-1681. · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647  Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371-1382. · Zbl 1323.62076 · doi:10.1198/jasa.2011.tm10382  Osborne, M. R., Presnell, B. and Turlach, B. A. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389  Osborne, M. R., Presnell, B. and Turlach, B. A. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319-337.  Park, M. Y. and Hastie, T. (2007). $$L_1$$-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659-677. · doi:10.1111/j.1467-9868.2007.00607.x  Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J. and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31 298-303.  Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515 · doi:10.1093/biomet/ass043  Taylor, J., Loftus, J. and Tibshirani, R. J. (2013). Tests in adaptive regression via the Kac-Rice formula. Preprint. Available at . · Zbl 1337.62304 · arxiv.org  Taylor, J., Takemura, A. and Adler, R. J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362-1396. · Zbl 1083.60031 · doi:10.1214/009117905000000099  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538  Tibshirani, Ryan J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456-1490. · Zbl 1337.62173 · doi:10.1214/13-EJS815  Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003  van de Geer, S. and Bühlmann, P. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at . · Zbl 1432.62112 · doi:10.1016/j.jspi.2013.03.006 · arxiv.org  Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $$\ell_1$$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018  Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646  Weissman, I. (1978). Estimation of parameters and large quantiles based on the $$k$$ largest observations. J. Amer. Statist. Assoc. 73 812-815. · Zbl 0397.62034 · doi:10.2307/2286285  Zhang, C.-H. and Zhang, S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 217-242. · doi:10.1111/rssb.12026  Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org  Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x  Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127 · euclid:aos/1194461726
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.