×

On model selection consistency of regularized M-estimators. (English) Zbl 1309.62044

Summary: Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Usually the low-dimensional structure is encoded by the presence of the (unknown) parameters in some low-dimensional model subspace. In such settings, it is desirable for estimates of the model parameters to be model selection consistent: the estimates also fall in the model subspace. We develop a general framework for establishing consistency and model selection consistency of regularized M-estimators and show how it applies to some special cases of interest in statistical learning. Our analysis identifies two key properties of regularized M-estimators, referred to as geometric decomposability and irrepresentability, that ensure the estimators are consistent and model selection consistent.

MSC:

62F10 Point estimation

Software:

PDCO; glasso
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Bach, F. R. (2008). Consistency of trace norm minimization., The Journal of Machine Learning Research 9 1019-1048. · Zbl 1225.68146
[2] Bach, F. R. (2010). Structured sparsity-inducing norms through submodular functions. In, Adv. Neural Inf. Process. Syst. (NIPS) 118-126.
[3] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and, Applications. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[4] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via \(\ell_1\) and \(\ell_1+\ell_2\) penalization., Electron. J. Stat. 2 1153-1194. · Zbl 1320.62170 · doi:10.1214/08-EJS287
[5] Candès, E. and Recht, B. (2012). Simple bounds for recovering low-complexity models., Math. Prog. Ser. A 1-13. · Zbl 1278.15038 · doi:10.1007/s10107-012-0540-0
[6] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization., Ann. Statis. 40 1935-1967. · Zbl 1257.62061 · doi:10.1214/11-AOS949
[7] Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems., Foundations of Computational Mathematics 12 805-849. · Zbl 1280.52008 · doi:10.1007/s10208-012-9135-7
[8] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit., SIAM Review 43 129-159. · Zbl 0979.94010 · doi:10.1137/S003614450037906X
[9] Cheng, J., Levina, E. and Zhu, J. (2013). High-dimensional mixed graphical models., .
[10] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[11] Jalali, A., Ravikumar, P., Vasuki, V., Sanghavi, S. and ECE, U. (2011). On learning discrete graphical models using group-sparse regularization. In, Int. Conf. Artif. Intell. Stat. (AISTATS) . · Zbl 1200.62020 · doi:10.1214/10-AOS792
[12] James, G. M., Paulson, C. and Rusmevichientong, P. (2012). The constrained lasso. Technical Report, University of Southern, California.
[13] Kolar, M., Song, L., Ahmed, A. and Xing, E. (2010). Estimating time-varying networks., Ann. Appl. Stat. 4 94-123. · Zbl 1189.62142 · doi:10.1214/09-AOAS308
[14] Lee, J. D. and Hastie, T. (2012). Learning mixed graphical models., .
[15] Lee, J. D., Sun, Y. and Saunders, M. A. (2009). Proximal Newton-type methods for minimizing composite functions. In, Adv. Neural Inf. Process. Syst. (NIPS) 827-835.
[16] Li, W. and Sun, W. (2002). Perturbation bounds of unitary and subunitary polar factors., SIAM Journal on Matrix Analysis and Applications 23 1183-1193. · Zbl 1036.15017 · doi:10.1137/S0895479801394623
[17] Loh, P. L. and Wainwright, M. J. (2012). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses., . · Zbl 1288.62081 · doi:10.1214/13-AOS1162
[18] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statis. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[19] Negahban, S., Wainwright, M. J. et al. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling., The Annals of Statistics 39 1069-1097. · Zbl 1216.62090 · doi:10.1214/10-AOS850
[20] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers., Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[21] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression., Ann. Statis. 39 1-47. · Zbl 1373.62372 · doi:10.1214/09-AOS776
[22] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs., The Journal of Machine Learning Research 11 2241-2259. · Zbl 1242.62071
[23] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using \(\ell_1\)-regularized logistic regression., Ann. Statis. 38 1287-1319. · Zbl 1189.62115 · doi:10.1214/09-AOS691
[24] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing \(\ell_1\)-penalized log-determinant divergence., Electron. J. Stat. 5 935-980. · Zbl 1274.62190 · doi:10.1214/11-EJS631
[25] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements., IEEE Transactions on Information Theory 59 3434-3447. · Zbl 1364.94158 · doi:10.1109/TIT.2013.2243201
[26] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. 267-288. · Zbl 0850.62538
[27] Tibshirani, R. J. and Taylor, J. E. (2011). The solution path of the generalized lasso., Ann. Statis. 39 1335-1371. · Zbl 1234.62107 · doi:10.1214/11-AOS878
[28] Vaiter, S., Peyré, G., Dossal, C. and Fadili, J. (2013). Robust sparse analysis regularization., IEEE Trans. Inform. Theory 59 2001-2016. · Zbl 1291.65189 · doi:10.1109/TIT.2012.2233859
[29] van de Geer, S. (2012). Weakly decomposable regularization penalties and structured sparsity., . · Zbl 1349.62325 · doi:10.1111/sjos.12032
[30] Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices., . · doi:10.1017/CBO9780511794308.006
[31] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso)., IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[32] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference., Found. Trends Mach. Learn. 1 1-305. · Zbl 1193.62107 · doi:10.1561/2200000001
[33] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.