×

zbMATH — the first resource for mathematics

On asymptotically optimal confidence regions and tests for high-dimensional models. (English) Zbl 1305.62259
Summary: We propose a general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model. It can be easily adjusted for multiplicity taking dependence among tests into account. For linear models, our method is essentially the same as in [C.-H. Zhang and S. S. Zhang, “Confidence intervals for low dimensional parameters in high dimensional linear models”, J. R. Stat. Soc., Ser. B, Stat. Methodol. 76, 217–242 (2014)]: we analyze its asymptotic properties and establish its asymptotic optimality in terms of semiparametric efficiency. Our method naturally extends to generalized linear models with convex loss functions. We develop the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.

MSC:
62J07 Ridge regression; shrinkage estimators (Lasso)
62J12 Generalized linear models (logistic models)
62F12 Asymptotic properties of parametric estimators
62F25 Parametric tolerance and confidence regions
Software:
glasso; glmnet
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection amongst high-dimensional controls. Rev. Econ. Stud. 81 608-650.
[2] Belloni, A., Chernozhukov, V. and Kato, K. (2013). Uniform postselection inference for LAD regression models. Available at . · arxiv.org
[3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791-806. · Zbl 1228.62083 · doi:10.1093/biomet/asr043
[4] Belloni, A., Chernozhukov, V. and Wei, Y. (2013). Honest confidence regions for logistic regression with a large number of controls. Available at . · Zbl 06168762 · doi:10.3150/11-BEJ410 · arxiv.org
[5] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. · Zbl 1267.62080 · doi:10.1214/12-AOS1077
[6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[7] Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559-583. · Zbl 1095.62077 · doi:10.1214/009053606000000092
[8] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212-1242. · Zbl 1273.62173 · doi:10.3150/12-BEJSP11
[9] Bühlmann, P., Kalisch, M. and Meier, L. (2014). High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Applications 1 255-278.
[10] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015
[11] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523 · euclid:aos/1201012958
[13] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. J. Amer. Statist. Assoc. 106 608-625. · Zbl 1232.62088 · doi:10.1198/jasa.2011.tm10159
[14] Chatterjee, A. and Lahiri, S. N. (2013). Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Statist. 41 1232-1259. · Zbl 1293.62153 · doi:10.1214/13-AOS1106
[15] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Mathematical Series 9 . Princeton Univ. Press, Princeton, NJ. · Zbl 0063.01014
[16] Dümbgen, L., van de Geer, S. A., Veraar, M. C. and Wellner, J. A. (2010). Nemirovski’s inequalities revisited. Amer. Math. Monthly 117 138-160. · Zbl 1213.60039 · doi:10.4169/000298910X476059
[17] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849-911. · doi:10.1111/j.1467-9868.2008.00674.x
[18] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101-148. · Zbl 1180.62080 · www3.stat.sinica.edu.tw
[19] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[20] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularized paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22.
[21] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[22] Javanmard, A. and Montanari, A. (2013). Confidence intervals and hypothesis testing for high-dimensional regression. Available at . · Zbl 1319.62145 · arxiv.org
[23] Javanmard, A. and Montanari, A. (2013). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. Available at . · Zbl 1360.62074 · arxiv.org
[24] Juditsky, A., Kilinç Karzan, F., Nemirovski, A. and Polyak, B. (2012). Accuracy guaranties for \(\ell_1\) recovery of block-sparse signals. Ann. Statist. 40 3077-3107. · Zbl 1296.62088 · doi:10.1214/12-AOS1057
[25] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[26] Lederer, J. and van de Geer, S. (2014). New concentration inequalities for suprema of empirical processes. Bernoulli . To appear. Available at . · Zbl 1355.60026 · arxiv.org
[27] Li, K.-C. (1989). Honest confidence regions for nonparametric regression. Ann. Statist. 17 1001-1008. · Zbl 0681.62047 · doi:10.1214/aos/1176347253
[28] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53-71. · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[29] Meinshausen, N. (2013). Assumption-free confidence intervals for groups of variables in sparse high-dimensional regression. Available at . · Zbl 1327.62422 · arxiv.org
[30] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[31] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x
[32] Meinshausen, N., Meier, L. and Bühlmann, P. (2009). \(p\)-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671-1681. · Zbl 1205.62089 · doi:10.1198/jasa.2009.tm08647
[33] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582 · www.projecteuclid.org
[34] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[35] Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist. 41 2852-2876. · Zbl 1288.62108 · doi:10.1214/13-AOS1170
[36] Portnoy, S. (1987). A central limit theorem applicable to robust regression estimators. J. Multivariate Anal. 22 24-50. · Zbl 0626.62033 · doi:10.1016/0047-259X(87)90073-X
[37] Pötscher, B. M. (2009). Confidence sets based on sparse estimators are necessarily large. Sankhyā 71 1-18. · Zbl 1192.62096 · sankhya.isical.ac.in
[38] Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065-2082. · Zbl 1170.62046 · doi:10.1016/j.jmva.2009.06.010
[39] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241-2259. · Zbl 1242.62071
[40] Robinson, P. M. (1988). Root-\(N\)-consistent semiparametric regression. Econometrica 56 931-954. · Zbl 0647.62100 · doi:10.2307/1912705
[41] Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 75 55-80. · doi:10.1111/j.1467-9868.2011.01034.x
[42] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515 · doi:10.1093/biomet/ass043
[43] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[44] van de Geer, S. (2007). The deterministic Lasso. In JSM Proceedings , 2007, 140. Am. Statist. Assoc., Alexandria, VA.
[45] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). Supplement to “On asymptotically optimal confidence regions and tests for high-dimensional models.” . · Zbl 1305.62259 · dx.doi.org
[46] van de Geer, S. and Müller, P. (2012). Quasi-likelihood and/or robust estimation in high dimensions. Statist. Sci. 27 469-480. · Zbl 1331.62354 · doi:10.1214/12-STS397
[47] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[48] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[49] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[50] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646
[51] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[52] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 217-242. · doi:10.1111/rssb.12026
[53] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.