×

A tuning-free robust and efficient approach to high-dimensional regression. (English) Zbl 1452.62525

Summary: We introduce a novel approach for high-dimensional regression with theoretical guarantees. The new procedure overcomes the challenge of tuning parameter selection of Lasso and possesses several appealing properties. It uses an easily simulated tuning parameter that automatically adapts to both the unknown random error distribution and the correlation structure of the design matrix. It is robust with substantial efficiency gain for heavy-tailed random errors while maintaining high efficiency for normal random errors. Comparing with other alternative robust regression procedures, it also enjoys the property of being equivariant when the response variable undergoes a scale transformation. Computationally, it can be efficiently solved via linear programming. Theoretically, under weak conditions on the random error distribution, we establish a finite-sample error bound with a near-oracle rate for the new estimator with the simulated tuning parameter. Our results make useful contributions to mending the gap between the practice and theory of Lasso and its variants. We also prove that further improvement in efficiency can be achieved by a second-stage enhancement with some light tuning. Our simulation results demonstrate that the proposed methods often outperform cross-validated Lasso in various settings.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models

Software:

PDCO; QICD; flare; glmnet
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Avella-Medina, M. and Ronchetti, E. (2018). Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika, 105:31-44. · Zbl 07072391
[2] Belloni, A., Chernozhukov, V., and Wang, L. (2011). Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791-806. · Zbl 1228.62083
[3] Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37(4):1705-1732. · Zbl 1173.62022
[4] Bien, J., Gaynanova, I., Lederer, J., and Müller, C. (2016). Non-convex global minimization and false discovery rate control for the trex. arXiv preprint arXiv:1604.06815.
[5] Bien, J., Gaynanova, I., Lederer, J., and Müller, C. L. (2018). Prediction error bounds for linear regression with the trex. TEST, pages 1-24. · Zbl 1420.62304
[6] Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge university press. · Zbl 1058.90049
[7] Bradic, J., Fan, J., and Wang, W. (2011). Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. Journal of the Royal Statistical Society: Series B, 73(3):325-349. · Zbl 1411.62181
[8] Bühlmann, P. and van de Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media. · Zbl 1273.62015
[9] Bunea, F., Tsybakov, A., Wegkamp, M., et al. (2007). Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics, 1:169-194. · Zbl 1146.62028
[10] Candes, E. and Tao, T. (2007). The dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313-2351. · Zbl 1139.62019
[11] Chatterjee, S. and Jafarov, J. (2015). Prediction error of cross-validated lasso. arXiv preprint arXiv:1502.06291.
[12] Chen, J. and Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3):759-771. · Zbl 1437.62415
[13] Chen, S. S., Donoho, D. L., and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM review, 43(1):129-159. · Zbl 0979.94010
[14] Chetverikov, D., Liao, Z., and Chernozhukov, V. (2016). On cross-validated lasso. arXiv preprint arXiv:1605.02214.
[15] Chichignoud, M., Lederer, J., and Wainwright, M. J. (2016). A practical scheme and fast algorithm to tune the lasso with optimality guarantees. Journal of Machine Learning Research, 17(231):1-20. · Zbl 1404.68096
[16] Clémençon, S., Colin, I., and Bellet, A. (2016). Scaling-up empirical risk minimization: optimization of incomplete u-statistics. The Journal of Machine Learning Research, 17(1):2682-2717. · Zbl 1360.62173
[17] Clémençon, S., Lugosi, G., and Vayatis, N. (2008). Ranking and empirical minimization of u-statistics. The Annals of Statistics, 36(2):844-874. · Zbl 1181.68160
[18] Dicker, L. H. (2014). Variance estimation in high-dimensional linear models. Biometrika, 101(2):269-284. · Zbl 1452.62495
[19] Fan, J., Fan, Y., and Barut, E. (2014). Adaptive robust variable selection. Annals of Statistics, 42(1):324. · Zbl 1296.62144
[20] Fan, J., Guo, S., and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B, 74(1):37-65. · Zbl 1411.62199
[21] Fan, J., Li, Q., and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B, 79(1):247-265. · Zbl 1414.62178
[22] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle property. Journal of the American Statistical Association, 96:1348-1360. · Zbl 1073.62547
[23] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1):101-148. · Zbl 1180.62080
[24] Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society, Series B, 75:531-552. · Zbl 1411.62216
[25] Feng, L., Zou, C., and Wang, Z. (2012). Rank-based inference for the single-index model. Statistics & Probability Letters, 82(3):535-541. · Zbl 1237.62041
[26] Feng, Y. and Yu, Y. (2019). The restricted consistency property of leave-nv-out cross-validation for high-dimensional variable selection. Statistica Sinica, 29(3):1607-1630. · Zbl 1422.62255
[27] Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1-22.
[28] Hebiri, M. and Lederer, J. (2013). How correlations influence lasso prediction. IEEE Transactions on Information Theory, 59(3):1846-1854. · Zbl 1364.62186
[29] Hettmansperger, T. P. and McKean, J. W. (1998). Robust Nonparametric Statistical Methods. London: Arnold. · Zbl 0887.62056
[30] Hjort, N. L. and Pollard, D. (2011). Asymptotics for minimisers of convex processes. arXiv preprint arXiv:1107.3806.
[31] Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and cross-validation. In International Conference on Machine Learning, 1031-1039.
[32] Homrighausen, D. and McDonald, D. J. (2017). Risk consistency of cross-validation with lasso-type procedures. Statistica Sinica, 27(3):1017-1036. · Zbl 1372.62022
[33] Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. The Annals of Mathematical Statistics, 43(5):1449-1458. · Zbl 0277.62049
[34] Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York. · Zbl 1111.62037
[35] Lederer, J. and Müller, C. (2015). Don’t fall for tuning parameters: tuning-free variable selection in high dimensions with the trex. AAAI Conference on Artificial Intelligence, 2729-2735.
[36] Ledoux, M. and Talagrand, M. (2013). Probability in Banach Spaces: Isoperimetry and Processes. Springer Science & Business Media. · Zbl 1226.60003
[37] Lee, E. R., Noh, H., and Park, B. U. (2014). Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109(505):216-229. · Zbl 1367.62122
[38] Leng, C. (2010). Variable selection and coefficient estimation via regularized rank regression. Statistica Sinica, pages 167-181. · Zbl 1180.62058
[39] Li, X., Zhao, T., Wang, L., Yuan, X., and Liu, H. (2018). Flare: family of Lasso regression. R package version 1.6.0.
[40] Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust m-estimators. The Annals of Statistics, 45(2):866-896. · Zbl 1371.62023
[41] Lozano, A. C., Meinshausen, N., Yang, E., et al. (2016). Minimum distance lasso for robust high-dimensional regression. Electronic Journal of Statistics, 10(1):1296-1340. · Zbl 1349.62322
[42] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436-1462. · Zbl 1113.62082
[43] Naranjo, J. D. and McKean, J. W. (1997). Rank regression with estimated scores. Statistics & probability letters, 33(2):209-216. · Zbl 0901.62051
[44] Parzen, M., Wei, L., and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika, 81(2):341-350. · Zbl 0807.62038
[45] Peng, B. and Wang, L. (2015). An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. Journal of Computational and Graphical Statistics, 24(3):676-694.
[46] Prasad, A., Suggala, A. S., Balakrishnan, S., and Ravikumar, P. (2020). Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society Series B, 82(3):601-627. · Zbl 07554767
[47] Sabourin, J. A., Valdar, W., and Nobel, A. B. (2015). A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics, 71(4):1185-1194. · Zbl 1419.62171
[48] Sun, Q., Zhou, W.-X., and Fan, J. (2020). Adaptive huber regression. Journal of the American Statistical Association, 115:254-265. · Zbl 1437.62250
[49] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99(4):879-898. · Zbl 1452.62515
[50] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. The Journal of Machine Learning Research, 14(1):3385-3418. · Zbl 1318.62184
[51] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1):267-288. · Zbl 0850.62538
[52] Van de Geer, S. A. et al. (2008). High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36(2):614-645. · Zbl 1138.62323
[53] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Science & Business Media. · Zbl 0862.60002
[54] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using l_1-constrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183-2202. · Zbl 1367.62220
[55] Wang, H., Li, B., and Leng, C. (2009a). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):671-683. · Zbl 1250.62036
[56] Wang, H., Li, R., and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3):553-568. · Zbl 1135.62058
[57] Wang, L. (2013). The l_1 penalized lad estimator for high dimensional linear regression. Journal of Multivariate Analysis, 120:135-151. · Zbl 1279.62144
[58] Wang, L., Kai, B., and Li, R. (2009b). Local rank inference for varying coefficient models. Journal of the American Statistical Association, 104(488):1631-1645. · Zbl 1205.62092
[59] Wang, L., Kim, Y., and Li, R. (2013a). Calibrating non-convex penalized regression in ultra-high dimension. Annals of statistics, 41(5):2505-2536. · Zbl 1281.62106
[60] Wang, L. and Li, R. (2009). Weighted wilcoxon-type smoothly clipped absolute deviation method. Biometrics, 65(2):564-571. · Zbl 1167.62093
[61] Wang, L., Wu, Y., and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497):214-222. · Zbl 1328.62468
[62] Wang, X., Jiang, Y., Huang, M., and Zhang, H. (2013b). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108(502):632-643. · Zbl 06195966
[63] Wu, Y. and Wang, L. (2020). A survey of tuning parameter selection for high-dimensional regression. Annual Review of Statistics and Its Application, 7(1):209-226.
[64] Yu, G. and Bien, J. (2019). Estimating the error variance in a high-dimensional linear model. Biometrika, 106(3):533-546. · Zbl 1464.62350
[65] Zhang, C. H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38:894-942. · Zbl 1183.62120
[66] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics, 36(4):1567-1594. · Zbl 1142.62044
[67] Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statistical Science, 27(4):576-593. · Zbl 1331.62353
[68] Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11:1081-1107. · Zbl 1242.68262
[69] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541-2563. · Zbl 1222.62008
[70] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36:1509-1566. · Zbl 1282.62112
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.