×

Recent developments in high dimensional covariance estimation and its related issues, a review. (English) Zbl 1395.62124

Summary: In this paper we review some of recent developments in high dimensional data analysis, especially in the estimation of covariance and precision matrix, asymptotic results on the eigenstructure in the principal components analysis, and some relevant issues such as test on the equality of two covariance matrices, determination of the number of principal components, and detection of hubs in a complex network.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis

Software:

glasso; MIM; spcov
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahn, S. C.; Horenstein, A. R., Eigenvalue ratio test for the number of factors, Econometrica, 81, 1203-1227, (2013) · Zbl 1274.62403
[2] Alessi, L.; Barigozzi, M.; Capasso, M., Improved penalization for determining the number of factors in approximate factor models, Statistics & Probability Letters, 80, 1806-1813, (2010) · Zbl 1202.62081
[3] Bai, Z. D., Convergence rate of expected spectral distributions of large random matrices, The Annals of Probability, 21, 649-672, (1993) · Zbl 0779.60025
[4] Bai, J.; Li, K., Statistical analysis of factor models of high dimension, The Annals of Probability, 40, 437-465, (2012) · Zbl 1246.62144
[5] Bai, J.; Ng, S., Determining the number of factors in approximate factor models, Econometrica, 70, 191-221, (2002) · Zbl 1103.91399
[6] Bai, Z. D.; Yin, Y. Q., Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix, The Annals of Probability, 21, 1275-1294, (1993) · Zbl 0779.60026
[7] Bao, Z. G., Pan, G. M., & Zhou, W. (2011). Tracy-Widomlaw for the extreme eigenvalues of sample correlation matrices. Preprint. Available at arXiv:1110.5208; Bao, Z. G., Pan, G. M., & Zhou, W. (2011). Tracy-Widomlaw for the extreme eigenvalues of sample correlation matrices. Preprint. Available at arXiv:1110.5208
[8] Berthet, Q.; Rigollet, P., Optimal detection of sparse principal components in high dimension, The Annals of Statistics, 41, 1780-1815, (2013) · Zbl 1277.62155
[9] Bickel, P. J.; Levina, E., Covariance regularization by thresholding, The Annals of Statistics, 36, 2577-2604, (2008) · Zbl 1196.62062
[10] Bickel, P. J.; Levina, E., Regularized estimation of large covariance matrices, The Annals of Statistics, 36, 199-227, (2008) · Zbl 1132.62040
[11] Bien, J.; Tibshirani, R. J., Sparse estimation of a covariance matrix, Biometrika, 98, 807-820, (2011) · Zbl 1228.62063
[12] Birnbaum, A.; Johnstone, I. M.; Nadler, B.; Paul, D., Minimax bounds for sparse PCA with noisy high-dimensional data, The Annals of Statistics, 41, 1055-1084, (2013) · Zbl 1292.62071
[13] Bonacich, P., Power and centrality: A family of measures power and centrality, The American Journal of Sociology, 92, 1170-1182, (1987)
[14] Butte, A. J.; Tamayo, P.; Slonim, D.; Golub, T. R.; Kohane, I. S., Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks, Proceedings of the National Academy of Sciences, 97, 12182-12186, (2000)
[15] Cai, T. T.; Liu, W., A direct estimation approach to sparse linear discriminant analysis, Journal of the American Statistical Association, 106, 1566-1577, (2011) · Zbl 1233.62129
[16] Cai, T. T.; Liu, W.; Luo, X., A constrained \(l_1\) minimization approach to sparse precision matrix estimation, Journal of the American Statistical Association, 106, 672-684, (2011) · Zbl 1232.62086
[17] Cai, T. T.; Liu, W.; Xia, Y., Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings, Journal of the American Statistical Association, 108, 265-277, (2013) · Zbl 06158341
[18] Cai, T. T.; Liu, W.; Zhou, H. H., Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation, The Annals of Statistics, 44, 455-488, (2016) · Zbl 1341.62115
[19] Cai, T. T.; Ma, Z.; Wu, Y., Optimal estimation and rank detection for sparse spiked covariance matrices, Probability Theory and Related Fields, 161, 781-815, (2015) · Zbl 1314.62130
[20] Cai, T. T.; Ren, Z.; Zhou, H. H., Optimal rates of convergence for estimating Toeplitz covariance matrices, Probability Theory and Related Fields, 156, 101-143, (2013) · Zbl 06176807
[21] Cai, T. T.; Yuan, M., Adaptive covariance matrix estimation through block thresholding, The Annals of Statistics, 40, 2014-2042, (2012) · Zbl 1257.62060
[22] Cai, T. T.; Zhang, C. H.; Zhou, H. H., Optimal rates of convergence for covariance matrix estimation, The Annals of Statistics, 38, 2118-2144, (2010) · Zbl 1202.62073
[23] Cai, T. T.; Zhou, H. H., Minimax estimation of large covariance matrices under \(l_1\) norm (with discussion), Statistica Sinica, 22, 1319-1378, (2012) · Zbl 1266.62036
[24] Chandrasekaran, V.; Parrilo, P. A.; Willsky, A. S., Latent variable graphical model selection via convex optimization, The Annals of Statistics, 40, 1935-1967, (2012) · Zbl 1257.62061
[25] Chaudhuri, S., Alur, R., & Cerny, P. (2007). Model checking on trees with path equivalences. In 13th international conference on tools and algorithms for the construction and analysis of systems; Chaudhuri, S., Alur, R., & Cerny, P. (2007). Model checking on trees with path equivalences. In 13th international conference on tools and algorithms for the construction and analysis of systems · Zbl 1186.68273
[26] Choi, Y.; Taylor, J.; Tibshirani, R., Selecting the number of principal components: estimation of the rank of a noisy matrix, The Annals of Statistics, 45, 2590-2617, (2017) · Zbl 1394.62073
[27] Chun, M.; Kim, C.; Chang, I., Uncovering multiloci-ordering by algebraic property of Laplacian matrix and its Fiedler vector, Bioinformatics, 32, 801-807, (2016)
[28] Dempster, A. P., Covariance selection, Bioemtrics, 28, 157-175, (1972)
[29] Edward, D., Introduction to graphical modelling, (2000), Springer New York · Zbl 0952.62003
[30] El Karouri, N., Operator norm consistent estimation of large-dimensional sparse covariance matrices, The Annals of Statistics, 36, 2717-2756, (2008) · Zbl 1196.62064
[31] El Karouri, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, The Annals of Statistics, 36, 2757-2790, (2008) · Zbl 1168.62052
[32] Fan, J.; Fan, Y.; Lv, J., High dimensional covariance matrix estimation using a factor model, Journal of Econometrics, 147, 186-197, (2008) · Zbl 1429.62185
[33] Fan, J.; Liao, Y.; Liu, H., An overview on the estimation of large covariance and precision matrices, The Econometrics Journal, 19, C1-C32, (2016) · Zbl 1521.62083
[34] Fan, J.; Liao, Y.; Mincheva, M., High-dimensional covariance matrix estimation in approximate factor models, The Annals of Statistics, 39, 3320-3356, (2011) · Zbl 1246.62151
[35] Fan, J.; Liao, Y.; Mincheva, M., Large covariance estimation by thresholding principal orthogonal complements (with discussion), Journal of the Royal Statistical Society. Series B., 75, 603-680, (2013) · Zbl 1411.62138
[36] Fan, J.; Liao, Y.; Wang, W., Projected principal component analysis in factor models, The Annals of Statistics, 44, 219-254, (2016) · Zbl 1331.62295
[37] Friedman, J.; Hastie, T.; Tibshirani, T., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2008) · Zbl 1143.62076
[38] Hallin, M.; Lis̆ka, R., Determining the number of factors in the general dynamic factor model, Journal of the American Statistical Association, 102, 603-617, (2007) · Zbl 1172.62339
[39] Hong, Y., A study on the adjacency matrix and hub in networks, (2015), Pusan National University, Unpublished
[40] Huang, J. Z.; Liu, N.; Pourahmadi, M.; Liu, L., Covariance matrix selection and estimation via penalised normal likelihood, Biometrika, 93, 85-98, (2006) · Zbl 1152.62346
[41] Johnstone, I. M., On the distribution of the largest eigenvalue in principal component analysis, The Annals of Statistics, 29, 295-327, (2001) · Zbl 1016.62078
[42] Johnstone, I. M., Multivariate analysis and Jacobi ensembles: largest eigenvalue, Tracy-Widom limits and rates of convergence, The Annals of Statistics, 36, 2638-2716, (2008) · Zbl 1284.62320
[43] Johnstone, I. M.; Lu, A. Y., On consistency and sparsity for principal components analysis in high dimensions (with discussion), Journal of the American Statistical Association, 104, 682-693, (2009) · Zbl 1388.62174
[44] Jolliffe, I. T., Principal component analysis, (2002), Springer New York · Zbl 1011.62064
[45] Katz, L., A new status index derived from sociometric analysis, Psychometrika, 18, 39-43, (1953) · Zbl 0053.27606
[46] Kim, C.; Cheon, M.; Kang, M.; Chang, I., A simple and exact Laplacian clustering of complex networking phenomena: application to gene expression profiles, Proceedings of the National Academy of Sciences, 105, 4083-4087, (2008)
[47] Lam, C.; Fan, J., Sparsitency and rates of convergence in large covariance matrices, The Annals of Statistics, 37, 4254-4278, (2009) · Zbl 1191.62101
[48] Lam, C.; Yao, Q., Factor modeling for high-dimensional time series: inference for the number of factors, The Annals of Statistics, 40, 694-726, (2012) · Zbl 1273.62214
[49] Lam, C.; Yao, Q.; Bathia, N., Estimation of latent factors for high-dimensional time series, Biometrika, 98, 901-918, (2011) · Zbl 1228.62110
[50] Levina, E.; Vershynin, R., Partial estimation of covariance matrices, Probability Theory and Related Fields, 153, 405-419, (2012) · Zbl 1318.62179
[51] Li, J.; Chen, S. X., Two sample tests for high-dimensional covariance matrices, The Annals of Statistics, 40, 908-940, (2012) · Zbl 1274.62383
[52] Ma, Z., Sparse principal component analysis and iterative thresholding, The Annals of Statistics, 41, 772-801, (2013) · Zbl 1267.62074
[53] Marcenko, V. A.; Pastur, L. A., Distribution of eigenvalues for some sets of random matrices, Mathematics of the USSR - Sbornik, 1, 507-536, (1967) · Zbl 0152.16101
[54] Mardia, K. V.; Kent, J. T.; Bibby, J. M., Multivariate analysis, (1979), Academic Press New York · Zbl 0432.62029
[55] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the lasso, The Annals of Statistics, 34, 1436-1462, (2006) · Zbl 1113.62082
[56] Mieghem, P. V., Graph spectra for complex networks, (2010), Cambridge University Press New York
[57] Nadler, B., Finite sample approximation results for principal component analysis: A matrix perturbation approach, The Annals of Statistics, 36, 2791-2817, (2008) · Zbl 1168.62058
[58] Newman, M., Networks; an introduction, (2010), Oxford University Press New York · Zbl 1195.94003
[59] Paul, D., Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statistica Sinica, 17, 1617-1642, (2007) · Zbl 1134.62029
[60] Peng, J.; Wang, P.; Zhou, N.; Zhu, J., Partial correlation estimation by joint sparse regression models, Journal of the American Statistical Association, 104, 735-746, (2009) · Zbl 1388.62046
[61] Pillai, N. S.; Yin, J., Edge universality of correlation matrices, The Annals of Statistics, 40, 1737-1763, (2012) · Zbl 1260.15051
[62] Pourahmadi, M., Graphical models in applied mathematical multivariate statistics, (2013), John Wiley & Sons New York
[63] Rothman, A. J.; Levina, E.; Zhu, J., Generalized thresholding of large covariance matrices, Journal of the American Statistical Association, 104, 177-186, (2009) · Zbl 1388.62170
[64] Schott, J. R., A test for the equality of covariance matrices when the dimension is large relative to the sample sizes, Computational Statistics & Data Analysis, 51, 653-6542, (2007) · Zbl 1445.62121
[65] Shen, D.; Shen, H.; Marron, J. S., Consistency of sparse PCA in high dimension, low sample size contexts, Journal of Multivariate Analysis, 115, 317-333, (2013) · Zbl 1258.62072
[66] Srivastava, M. S.; Yanagihara, H., Testing the equality of several covariance matrices with fewer observations than the dimension, Journal of Multivariate Analysis, 101, 1319-1329, (2010) · Zbl 1186.62078
[67] Stock, J. H.; Watson, M. W., Forecasting using principal components from a large number of predictors, Journal of the American Statistical Association, 97, 1167-1179, (2002) · Zbl 1041.62081
[68] Tracy, C. A.; Widom, H., On orthogonal and symplectic matrix ensembles, Communications in Mathematical Physics, 177, 727-754, (1996) · Zbl 0851.60101
[69] Tracy, C. A.; Widom, H., The distribution of the largest eigenvalue in the Gaussian ensembles; \(\beta = 1, 2, 4\), CRM Series in Mathematical Physics, 4, 461-472, (2000)
[70] Vu, V. Q.; Cho, J.; Lei, J.; Rohe, K., Fantope projection and selection: A near-optimal convex relaxation of sparse pca, (Advances in neural information processing systems, (2013)), 2670-2678
[71] Vu, V. Q.; Lei, J., Minimax sparse principal subspace estimation in high dimensions, The Annals of Statistics, 41, 2905-2947, (2013) · Zbl 1288.62103
[72] Wang, W.; Fan, J., Asymptotics of empirical eigenstructure for high dimensional spiked covariance, The Annals of Statistics, 45, 1342-1374, (2017) · Zbl 1373.62299
[73] Whittaker, J., High-dimensional covariance estimation, (1990), John Wiley & Sons New York
[74] Wigner, E. P., Characteristic vectors of bordered matrices with infinite dimensions, Annals of Mathematics, 62, 548-564, (1955) · Zbl 0067.08403
[75] Wigner, E. P., On the distribution of the roots of certain symmetric matrices, Annals of Mathematics, 67, 325-328, (1958) · Zbl 0085.13203
[76] Xia, Y.; Cai, T.; Cai, T. T., Testing differential networks with applications to the detection of gene-gene interactions, Biometrika, 102, 247-266, (2015) · Zbl 1452.62392
[77] Yuan, M., High dimensional inverse covariance matrix estimation via linear programming, Journal of Machine Learning Research (JMLR), 11, 2261-2286, (2010) · Zbl 1242.62043
[78] Yuan, M.; Lin, Y., Model selection and estimation in the Gaussian graphical model, Biometrika, 94, 19-35, (2007) · Zbl 1142.62408
[79] Zou, H.; Hastie, T.; Tibshirani, R., Sparse principal component, Journal of Computational and Graphical Statistics, 15, 265-286, (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.