# zbMATH — the first resource for mathematics

A significance test for the lasso. (English) Zbl 1305.62254
A linear regression model is considered, $y=X\beta^*+\varepsilon,\quad \varepsilon\sim N(0, \sigma^2I),$ where $$y\in \mathbb{R}^n$$ is an outcome vector, $$X$$ is a design matrix, and $$\beta^*\in \mathbb{R}^p$$ are unknown coefficients to be estimated. The lasso estimator $$\hat {\beta} =\hat {\beta} (\lambda)$$ minimizes the objective function $Q(\beta; \lambda)=\frac{1}{2} \|y-X\beta\|_2^2+\lambda \|\beta\|_1,\quad \beta\in \mathbb{R}^p,$ where $$\lambda \geq 0$$ is a tuning parameter, controlling the level of sparsity in $$\hat {\beta}$$. It is assumed that the columns of $$X$$ are in general position in order to ensure uniqueness of the lasso solution, see [R. J. Tibshirani, Electron. J. Stat. 7, 1456–1490 (2013; Zbl 1337.62173)].
The path $$\hat {\beta} (\lambda)$$ is a piecewise linear function, with knots at values $$\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r \geq 0$$. At $$\lambda=\infty$$, the solution $$\hat {\beta}(\infty)$$ has no active variables, and for decreasing $$\lambda$$, each knot $$\lambda_k$$ marks the entry or removal of some variable from the current active set. At any $$\lambda \geq 0$$, the corresponding active set $$A=\operatorname{supp}(\hat {\beta}(\lambda))$$ indexes a linearly independent set of predictor variables, that is, $$\operatorname{rank}(X_A)=|A|$$, where $$X_A$$ denotes the columns of $$X$$ in $$A$$.
Let $$A$$ be the active set just before $$\lambda_k$$, and suppose that predictor $$j$$ enters at $$\lambda_k$$. Denote by $$\hat {\beta}(\lambda_{k+1})$$ the solution at point $$\lambda=\lambda_{k+1}$$, using predictors $$A$$ and $$j$$. Let $$\tilde{\beta}_A (\lambda_{k+1})$$ be the lasso solution using only the active predictors $$X_A$$, at $$\lambda=\lambda_{k+1}$$.
In the paper under review, the covariance test statistic is proposed, $T_k=\frac{1}{\sigma^2}(y, X\hat {\beta} (\lambda_{k+1})-X_A\tilde{\beta}_A (\lambda_{k+1})).$ The main result given in Theorem 3 states the following: under the null hypothesis that current lasso model contains all truly active variables, $$\operatorname{supp}(\beta^*) \subseteq A$$, $$T_k$$ is asymptotically distributed as a standard exponential random variable, given reasonable assumption on $$X$$ and the magnitudes of the nonzero true coefficients. This statistic can be used to test the significance of an additional variable between two nested models, when this additional variable is not fixed and has been chosen adaptively.
In Section 6, this result is modified for the case of unknown $$\sigma^2$$. Section 8 discusses some extensions to the elastic net, generalized linear models, and the Cox proportional hazards model; the proposals there are supported by simulations, but no theory is offered.

##### MSC:
 62J07 Ridge regression; shrinkage estimators (Lasso) 62F03 Parametric hypothesis testing 62J05 Linear regression; mixed models 62J12 Generalized linear models (logistic models)
##### Keywords:
lasso; least angle regression; $$p$$-value; significance test
##### Software:
covTest; ElemStatLearn; glmnet; NESTA; PDCO; TFOCS
Full Text:
##### References:
 [1] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542 [2] Becker, S., Bobin, J. and Candès, E. J. (2011). NESTA: A fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4 1-39. · Zbl 1209.90265 · doi:10.1137/090756855 [3] Becker, S. R., Candès, E. J. and Grant, M. C. (2011). Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3 165-218. · Zbl 1257.90042 · doi:10.1007/s12532-011-0029-5 [4] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternative direction method of multipliers. Faund. Trends Mach. Learn. 3 1-122. · Zbl 1229.90122 · doi:10.1561/2200000016 [5] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212-1242. · Zbl 1273.62173 · doi:10.3150/12-BEJSP11 [6] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $$\ell_1$$ minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053 · doi:10.1214/08-AOS653 [7] Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406-5425. · Zbl 1309.94033 · doi:10.1109/TIT.2006.885507 [8] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002 · doi:10.1137/S1064827596304010 [9] de Haan, L. and Ferreira, A. (2006). Extreme Value Theory : An Introduction . Springer, New York. · Zbl 1101.62002 [10] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582 [11] Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461-470. · Zbl 0621.62073 · doi:10.2307/2289236 [12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067 [13] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh-dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37-65. · doi:10.1111/j.1467-9868.2011.01005.x [14] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22. [15] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131 [16] Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601-3608. · Zbl 1286.94031 · doi:10.1109/TIT.2005.855614 [17] Grazier G’Sell, M., Taylor, J. and Tibshirani, R. (2013). Adaptive testing for the graphical lasso. Preprint. Available at . · arxiv.org [18] Grazier G’Sell, M., Wager, S., Chouldechova, A. and Tibshirani, R. (2013). False discovery rate control for sequential selection procedures, with application to the lasso. Preprint. Available at . · arxiv.org [19] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846 [20] Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning ; Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 [21] Javanmard, A. and Montanari, A. (2013a). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at . · Zbl 1319.62145 · arxiv.org [22] Javanmard, A. and Montanari, A. (2013b). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. Preprint. Available at . · Zbl 1360.62074 · arxiv.org [23] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x [24] Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $$p$$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671-1681. · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647 [25] Minnier, J., Tian, L. and Cai, T. (2011). A perturbation method for inference on regularized regression estimates. J. Amer. Statist. Assoc. 106 1371-1382. · Zbl 1323.62076 · doi:10.1198/jasa.2011.tm10382 [26] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389-403. · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389 [27] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319-337. [28] Park, M. Y. and Hastie, T. (2007). $$L_1$$-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659-677. · doi:10.1111/j.1467-9868.2007.00607.x [29] Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J. and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31 298-303. [30] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515 · doi:10.1093/biomet/ass043 [31] Taylor, J., Loftus, J. and Tibshirani, R. J. (2013). Tests in adaptive regression via the Kac-Rice formula. Preprint. Available at . · Zbl 1337.62304 · arxiv.org [32] Taylor, J., Takemura, A. and Adler, R. J. (2005). Validity of the expected Euler characteristic heuristic. Ann. Probab. 33 1362-1396. · Zbl 1083.60031 · doi:10.1214/009117905000000099 [33] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 [34] Tibshirani, Ryan J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456-1490. · Zbl 1337.62173 · doi:10.1214/13-EJS815 [35] Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003 [36] van de Geer, S. and Bühlmann, P. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at . · Zbl 1432.62112 · doi:10.1016/j.jspi.2013.03.006 · arxiv.org [37] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $$\ell_1$$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018 [38] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646 [39] Weissman, I. (1978). Estimation of parameters and large quantiles based on the $$k$$ largest observations. J. Amer. Statist. Assoc. 73 812-815. · Zbl 0397.62034 · doi:10.2307/2286285 [40] Zhang, C.-H. and Zhang, S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 217-242. · doi:10.1111/rssb.12026 [41] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org [42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x [43] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127 · euclid:aos/1194461726
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.