×

zbMATH — the first resource for mathematics

Sparsistency and rates of convergence in large covariance matrix estimation. (English) Zbl 1191.62101
Summary: This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order \((s_n \log p_n/n)^{1/2}\), where \(s_n\) is the number of nonzero elements, \(p_n\) is the size of the covariance matrix and \(n\) is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter \(\lambda _n\) goes to 0 have been made explicit and compared under different penalties. As a result, for the \(L_1\)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: \(s_n'=O(p_n)\) at most, among \(O(p_n^2)\) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where \(s_n'\) is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

MSC:
62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators
65C60 Computational problems in statistics (MSC2010)
Software:
glasso
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Bai, Z. and Silverstein, J. W. (2006). Spectral Analysis of Large Dimensional Random Matrices . Science Press, Beijing. · Zbl 1196.60002
[2] Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604. · Zbl 1196.62062 · doi:10.1214/08-AOS600
[3] Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040 · doi:10.1214/009053607000000758 · euclid:aos/1201877299
[4] Cai, T., Zhang, C.-H. and Zhou, H. (2008). Optimal rates of convergence for covariance matrix estimation. Technical report, The Wharton School, Univ. Pennsylvania.
[5] d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56-66. · Zbl 1156.90423 · doi:10.1137/060670985
[6] Dempster, A. P. (1972). Covariance selection. Biometrics 28 157-175.
[7] Diggle, P. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data. Biometrics 54 401-415. · Zbl 1058.62600 · doi:10.2307/3109751
[8] El Karoui, N. (2008). Operator norm consistent estimation of a large dimensional sparse covariance matrices. Ann. Statist. 36 2717-2756. · Zbl 1196.62064 · doi:10.1214/07-AOS559
[9] Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Stat. 3 521-541. · Zbl 1166.62040 · doi:10.1214/08-AOAS215
[10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 · links.jstor.org
[11] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[12] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[13] Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587-613. · Zbl 1133.62048 · doi:10.1214/009053607000000875
[14] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85-98. · Zbl 1152.62346 · doi:10.1093/biomet/93.1.85
[15] Levina, E., Rothman, A. J. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat. 2 245-263. · Zbl 1137.62338 · doi:10.1214/07-AOAS139
[16] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53-71. · Zbl 1400.62276
[17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[18] Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86 677-690. JSTOR: · Zbl 0949.62066 · doi:10.1093/biomet/86.3.677 · links.jstor.org
[19] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2007). Sparse additive models. In Advances in Neural Information Processing Systems 20 . MIT Press, Cambridge, MA.
[20] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494-515. · Zbl 1320.62135 · doi:10.1214/08-EJS176
[21] Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Amer. Statist. Assoc. 97 1141-1153. JSTOR: · Zbl 1041.62044 · doi:10.1198/016214502388618942 · links.jstor.org
[22] Wagaman, A. S. and Levina, E. (2008). Discovering sparse covariance structures with the Isomap. J. Comput. Graph. Statist. 18 .
[23] Wong, F., Carter, C. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90 809-830. · Zbl 1436.62346 · doi:10.1093/biomet/90.4.809
[24] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 94 1-17. · Zbl 1436.62347 · doi:10.1093/biomet/90.4.831
[25] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 90 831-844. · Zbl 1142.62408 · doi:10.1093/biomet/asm018
[26] Zhang, C. H. (2007). Penalized linear unbiased selection. Technical report 2007-003, The Statistics Dept., Rutgers Univ.
[27] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org
[28] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[29] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist. 36 1509-1533. · Zbl 1142.62027 · doi:10.1214/009053607000000802
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.