# zbMATH — the first resource for mathematics

Operator norm consistent estimation of large-dimensional sparse covariance matrices. (English) Zbl 1196.62064
Summary: Estimating covariance matrices is a problem of fundamental importance in multivariate statistics. In practice it is increasingly frequent to work with data matrices $$X$$ of dimension $$n\times p$$, where $$p$$ and $$n$$ are both large. Results from random matrix theory show very clearly that in this setting, standard estimators like the sample covariance matrix perform in general very poorly.
In this “large $$n$$, large $$p$$” setting, it is sometimes the case that practitioners are willing to assume that many elements of the population covariance matrix are equal to 0, and hence this matrix is sparse. We develop an estimator to handle this situation. The estimator is shown to be consistent in the operator norm, when, for instance, we have $$p\asymp n$$ as $$n\rightarrow \infty$$. In other words the largest singular value of the difference between the estimator and the population covariance matrix goes to zero. This implies consistency of all the eigenvalues and consistency of eigenspaces associated to isolated eigenvalues. We also propose a notion of sparsity for matrices, that is, “compatible” with spectral analysis and is independent of the ordering of the variables.

##### MSC:
 62H12 Estimation in multivariate analysis 15A18 Eigenvalues, singular values, and eigenvectors
Full Text:
##### References:
  Anderson, G. W. and Zeitouni, O. (2006). A CLT for a band matrix model. Probab. Theory Related Fields 134 283-338. · Zbl 1084.60014 · doi:10.1007/s00440-004-0422-3  Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis , 3rd ed. Wiley, Hoboken, NJ. · Zbl 1039.62044  Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553-605. · Zbl 1063.60022 · doi:10.1214/aop/1078415845  Bengtsson, T. and Furrer, R. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227-255. · Zbl 1105.62091 · doi:10.1016/j.jmva.2006.08.003  Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. JSTOR: · Zbl 0809.62014 · links.jstor.org  Bhatia, R. (1997). Matrix Analysis . Springer, New York. · Zbl 0863.15001  Bickel, P. J. and Levina, E. (2007). Regularized estimation of large covariance matrices. Ann. Statist. 36 199-227. · Zbl 1132.62040 · doi:10.1214/009053607000000758 · euclid:aos/1201877299  Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577-2604. · Zbl 1196.62062 · doi:10.1214/08-AOS600  d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56-66. · Zbl 1156.90423 · doi:10.1137/060670985  Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1-46. JSTOR: · Zbl 0198.47201 · doi:10.1137/0707001 · links.jstor.org  El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663-714. · Zbl 1117.60020 · doi:10.1214/009117906000000917  El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757-2790. · Zbl 1168.62052 · doi:10.1214/07-AOS581  Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252-261. · Zbl 0428.60039 · doi:10.1214/aop/1176994775  Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586-597. · Zbl 0441.62045 · doi:10.1214/aos/1176345010  Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. JSTOR: · Zbl 0127.10602 · doi:10.2307/2282952 · links.jstor.org  Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis . Cambridge Univ. Press. · Zbl 0704.15002  Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85-98. · Zbl 1152.62346 · doi:10.1093/biomet/93.1.85  James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. 4th Berkeley Symp. Math. Statist. Probab. I 361-379. Univ. California Press, Berkeley. · Zbl 1281.62026  Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. J. Multivariate Anal. 12 1-38. · Zbl 0491.62021 · doi:10.1016/0047-259X(82)90080-X  Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365-411. · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4  Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507-536. · Zbl 0152.16101  Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331-339. · Zbl 0851.62015 · doi:10.1006/jmva.1995.1083  Stanley, R. P. (1997). Enumerative Combinatorics . I . Cambridge Univ. Press. · Zbl 0889.05001  Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory . Academic Press, Boston, MA. · Zbl 0706.65013  Wigner, E. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. 62 548-564. JSTOR: · Zbl 0067.08403 · doi:10.2307/1970079 · links.jstor.org
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.