×

Large covariance estimation by thresholding principal orthogonal complements. With discussion and authors’ reply. (English) Zbl 1411.62138

Summary: The paper deals with the estimation of a high dimensional covariance with a conditional sparsity structure and fast diverging eigenvalues. By assuming a sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the principal orthogonal complement thresholding method ‘POET’ to explore such an approximate factor structure with sparsity. The POET-estimator includes the sample covariance matrix, the factor-based covariance matrix, the thresholding estimator and the adaptive thresholding estimator as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the effect of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis

Software:

PMA; ElemStatLearn
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agarwal, A., Negahban, S. and Wainwright, M. J. ( 2012) Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Statist., 40, 1171– 1197. · Zbl 1274.62219
[2] Ahn, S., Lee, Y. and Schmidt, P. ( 2001) GMM estimation of linear panel data models with time‐varying individual effects. J. Econmetr., 101, 219– 255. · Zbl 0966.62091
[3] Alessi, L., Barigozzi, M. and Capassoc, M. ( 2010) Improved penalization for determining the number of factors in approximate factor models. Statist. Probab. Lett., 80, 1806– 1813. · Zbl 1202.62081
[4] Amini, A. A. and Wainwright, M. J. ( 2009) High‐dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist., 37, 2877– 2921. · Zbl 1173.62049
[5] Antoniadis, A. and Fan, J. ( 2001) Regularized wavelet approximations. J. Am. Statist. Ass., 96, 939– 967. · Zbl 1072.62561
[6] d’Aspremont, A., Bach, F. and El Ghaoui, L. ( 2008) Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res., 9, 1269– 1294. · Zbl 1225.68170
[7] Athreya, K. and Lahiri, S. ( 2006) Measure Theory and Probability Theory. New York: Springer. · Zbl 1125.60001
[8] Bai, J. ( 2003) Inferential theory for factor models of large dimensions. Econometrica, 71, 135– 171. · Zbl 1136.62354
[9] Bai, J. and Ng, S. ( 2002) Determining the number of factors in approximate factor models. Econometrica, 70, 191– 221. · Zbl 1103.91399
[10] Bai, J. and Ng, S. ( 2008) Large dimensional factor analysis. Found. Trends Econmetr., 3, 89– 163.
[11] Bai, J. and Shi, S. ( 2011) Estimating high dimensional covariance matrices and its applications. Ann. Econ. Finan., 12, 199– 215.
[12] Bickel, P. and Levina, E. ( 2008) Covariance regularization by thresholding. Ann. Statist., 36, 2577– 2604. · Zbl 1196.62062
[13] Birnbaum, A., Johnstone, I., Nadler, B. and Paul, D. ( 2012) Minimax bounds for sparse PCA with noisy high‐dimensional data. Ann. Statist., to be published. · Zbl 1292.62071
[14] Boivin, J. and Ng, S. ( 2006) Are more data always better for factor analysis?J. Econmetr., 132, 169– 194. · Zbl 1337.62345
[15] Cai, J., Candès, E. and Shen, Z. ( 2008) A singular value thresholding algorithm for matrix completion. SIAM J. Optimizn, 20, 1956– 1982. · Zbl 1201.90155
[16] Cai, T. and Liu, W. ( 2011) Adaptive thresholding for sparse covariance matrix estimation. J. Am. Statist. Ass., 106, 672– 684. · Zbl 1232.62086
[17] Cai, T. and Zhou, H. ( 2012) Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist., 40, 2389– 2420. · Zbl 1373.62247
[18] Candès, E., Li, X., Ma, Y. and Wright, J. ( 2011) Robust principal component analysis?J. Ass. Comput. Mach., 58, 3.
[19] Carvalho, C., Chang, J., Lucas, J., Nevins, J., Wang, Q. and West, M. ( 2008) High‐dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Statist. Ass., 103, 1438– 1456. · Zbl 1286.62091
[20] Chamberlain, G. and Rothschild, M. ( 1983) Arbitrage, factor structure and mean‐variance analysis in large asset markets. Econometrica, 51, 1305– 1324. · Zbl 0523.90017
[21] Davis, C. and Kahan, W. ( 1970) The rotation of eigenvectors by a perturbation III. SIAM J. Numer. Anal., 7, 1– 46. · Zbl 0198.47201
[22] Doz, C., Giannone, D. and Reichlin, L. ( 2011) A two‐step estimator for large approximate dynamic factor models based on Kalman filtering. J. Econmetr., 164, 188– 205. · Zbl 1441.62671
[23] Efron, B. ( 2007) Correlation and large‐scale simultaneous significance testing. J. Am. Statist. Ass., 102, 93– 103. · Zbl 1284.62340
[24] Efron, B. ( 2010) Correlated z‐values and the accuracy of large‐scale statistical estimates. J. Am. Statist. Ass., 105, 1042– 1055. · Zbl 1390.62139
[25] Fama, E. and French, K. ( 1992) The cross‐section of expected stock returns. J. Finan., 47, 427– 465.
[26] Fan, J., Fan, Y. and Lv, J. ( 2008) High dimensional covariance matrix estimation using a factor model. J. Econmetr., 147, 186– 197. · Zbl 1429.62185
[27] Fan, J., Han, X. and Gu, W. ( 2012) Control of the false discovery rate under arbitrary covariance dependence (with discussion). J. Am. Statist. Ass., 107, 1019– 1048.
[28] Fan, J., Liao, Y. and Mincheva, M. ( 2011a) High dimensional covariance matrix estimation in approximate factor models. Ann. Statist., 39, 3320– 3356. · Zbl 1246.62151
[29] Fan, J., Liao, Y. and Mincheva, M. ( 2011b) Large covariance estimation by thresholding principal orthogonal complements. Preprint arxiv.org/pdf/1201.0175.pdf.
[30] Fan, J., Zhang, J. and Yu, K. ( 2012) Vast portfolio selection with gross‐exposure constraints. J. Am. Statist. Ass., 107, 592– 606. · Zbl 1261.62091
[31] Forni, M., Hallin, M., Lippi, M. and Reichlin, L. ( 2000) The generalized dynamic factor model: identification and estimation. Rev. Econ. Statist., 82, 540– 554.
[32] Forni, M., Hallin, M., Lippi, M. and Reichlin, L. ( 2004) The generalized dynamic factor model consistency and rates. J. Econmetr., 119, 231– 255. · Zbl 1282.91267
[33] Forni, M. and Lippi, M. ( 2001) The generalized dynamic factor model: representation theory. Econmetr. Theor., 17, 1113– 1141. · Zbl 1181.62189
[34] Fryzlewicz, P. ( 2012) High‐dimensional volatility matrix estimation via wavelets and thresholding. Manuscript. London School of Economics and Political Science, London. · Zbl 1452.62764
[35] Hallin, M. and Liška, R. ( 2007) Determining the number of factors in the general dynamic factor model. J. Am. Statist. Ass., 102, 603– 617. · Zbl 1172.62339
[36] Hallin, M. and Liška, R. ( 2011) Dynamic factors in the presence of blocks. J. Econmetr., 163, 29– 41. · Zbl 1441.62716
[37] Hastie, T. J., Tibshirani, R. and Friedman, J. ( 2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. New York: Springer. · Zbl 1273.62005
[38] James, W. and Stein, C. ( 1961) Estimation with quadratic loss. In Proc. 4th Berkeley Symp. Mathematical Statistics and Probability, vol. , pp. 361– 379. Berkeley: University of California Press. · Zbl 1281.62026
[39] Johnstone, I. M. ( 2001) On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29, 295– 327. · Zbl 1016.62078
[40] Johnstone, I. M. and Lu, A. Y. ( 2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Statist. Ass., 104, 682– 693. · Zbl 1388.62174
[41] Jung, S. and Marron, J. S. ( 2009) PCA consistency in high dimension, low sample size context. Ann. Statist., 37, 4104– 4130. · Zbl 1191.62108
[42] Kapetanios, G. ( 2010) A testing procedure for determining the number of factors in approximate factor models with large datasets. J. Bus. Econ. Statist., 28, 397– 409. · Zbl 1214.62068
[43] Lam, C. and Fan, J. ( 2009) Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist., 37, 4254– 4278. · Zbl 1191.62101
[44] Lawley, D. and Maxwell, A. ( 1971) Factor Analysis as a Statistical Method, 2nd edn. London: Butterworth. · Zbl 0251.62042
[45] Leek, J. and Storey, J. ( 2008) A general framework for multiple testing dependence. Proc. Natn. Acad. Sci. USA, 105, 18718– 18723. · Zbl 1359.62202
[46] Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M. and Ma, Y. ( 2009) Fast convex optimization algorithms for exact recovery of a corrupted low‐rank matrix. Int. Wrkshp Computational Advances in Multi‐Sensor Adaptive Processing, Aruba.
[47] Luo, X. ( 2011) High dimensional low rank and sparse covariance matrix estimation via convex minimization. Manuscript. University of Pennsylvania, Philadelphia.
[48] Ma, Z. ( 2013) Sparse principal components analysis and iterative thresholding. Ann. Statist., to be published. · Zbl 1267.62074
[49] Meinshausen, N. and Bühlmann, P. ( 2006) High dimensional graphs and variable selection with the Lasso. Ann. Statist., 34, 1436– 1462. · Zbl 1113.62082
[50] Onatski, A. ( 2010) Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Statist., 92, 1004– 1016.
[51] Pati, D., Bhattacharya, A., Pillai, N. and Dunson, D. ( 2012) Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Manuscript. Duke University, Durham. · Zbl 1305.62124
[52] Paul, D. ( 2007) Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sin., 17, 1617– 1642. · Zbl 1134.62029
[53] Pesaran, M. H. ( 2006) Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica, 74, 967– 1012. · Zbl 1152.91718
[54] Pesaran, M. H. and Yamagata, T. ( 2012) Testing CAPM with a large number of assets. American Finance Association San Diego Meetings Paper. (Available from http://ssrn.com/abstracts.)
[55] Phan, Q. ( 2012) On the sparsity assumption of the idiosyncratic errors covariance matrix—Support from the FTSE 100 stock returns. Manuscript. University of Warwick, Coventry.
[56] Ross, S. A. ( 1976) The arbitrage theory of capital asset pricing. J. Econ. Theor., 13, 341– 360.
[57] Rothman, A., Levina, E. and Zhu, J. ( 2009) Generalized thresholding of large covariance matrices. J. Am. Statist. Ass., 104, 177– 186. · Zbl 1388.62170
[58] Sentana, E. ( 2009) The econometrics of mean‐variance efficiency tests: a survey. Econometr. J., 12, 65– 101. · Zbl 1178.91232
[59] Sharpe, W. ( 1964) Capital asset prices: a theory of market equilibrium under conditions of risks. J. Finan., 19, 425– 442.
[60] Shen, H. and Huang, J. ( 2008) Sparse principal component analysis via regularized low rank matrix approximation. J. Multiv. Anal., 99, 1015– 1034. · Zbl 1141.62049
[61] Stock, J. and Watson, M. ( 1998) Diffusion Indexes. Working Paper 6702. National Bureau of Economic Research, Cambridge.
[62] Stock, J. and Watson, M. ( 2002) Forecasting using principal components from a large number of predictors. J. Am. Statist. Ass., 97, 1167– 1179. · Zbl 1041.62081
[63] Witten, D. M., Tibshirani, R. and Hastie, T. ( 2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515– 534.
[64] Wright, J., Peng, Y., , Ma, Y., Ganesh, A. and Rao, S. ( 2009) Robust principal component analysis: exact recovery of corrupted low‐rank matrices by convex optimization. New. Inform. Process. Syst.
[65] Xiong, H., Goulding, E. H., Carlson, E. J., Tecott, L. H., McCulloch, C. E. and Sen, S. ( 2011) A flexible estimating equations approach for mapping function‐valued traits. Genetics, 189, 305– 316.
[66] Yap, J. S., Fan, J. and Wu, R. ( 2009) Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 1068– 1077. · Zbl 1181.62186
[67] Zhang, Y.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.