Adjusting for high-dimensional covariates in sparse precision matrix estimation by \(\ell_1\)-penalization. (English) Zbl 1277.62146

Summary: Motivated by the analysis of genetical genomic data, we consider the problem of estimating a high-dimensional sparse precision matrix adjusting for a possibly large number of covariates, where the covariates can affect the mean value of the random vector. We develop a two-stage estimation procedure to first identify the relevant covariates that affect the means by a joint \(\ell_1\) penalization. The estimated regression coefficients are then used to estimate the mean values in a multivariate sub-Gaussian model in order to estimate the sparse precision matrix through a \(\ell_1\)-penalized log-determinant Bregman divergence.
Under the multivariate normal assumption, the precision matrix has the interpretation of a conditional Gaussian graphical model. We show that under some regularity conditions, the estimates of the regression coefficients are consistent in the element-wise \(\ell_\infty\) norm, Frobenius norm and also spectral norm even when \(p \gg n\) and \(q \gg n\). We also show that with probability converging to one, the estimate of the precision matrix correctly specifies the zero pattern of the true precision matrix. We illustrate our theoretical results via simulations and demonstrate that the method can lead to improved estimate of the precision matrix. We apply the method to an analysis of a yeast genetical genomic data.


62H12 Estimation in multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
62Q05 Statistical tables
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI


[1] Bickel, P.; Levina, E., Regularized estimation of large covariance matrices, Annals of Statistics, 36, 1, 199-227, (2008) · Zbl 1132.62040
[2] Bickel, P.; Levina, E., Covariance regularization by thresholding, Annals of Statistics, 36, 6, 2577-2604, (2008) · Zbl 1196.62062
[3] Brem, R.; Kruglyak, L., The landscape of genetic complexity across 5700 gene expression traits in yeast, Proceedings of National Academy of Sciences, 102, 1572-1577, (2005)
[4] Bunea, F.; She, Y.; Wegkamp, M., Optimal selection of reduced rank estimators of high-dimensional matrices, Annals of Statistics, 39, 2, 1282-1309, (2011) · Zbl 1216.62086
[5] Cai, T.; Liu, W.; Luo, X., A constrained \(l 1\) minimization approach to sparse precision matrix estimation, Journal of American Statistical Association, 106, 594-607, (2011) · Zbl 1232.62087
[6] Cai, T.; Zhang, C.-H.; Zhou, H., Optimal rates of convergence for covariance matrix estimation, The Annals of Statistics, 38, 2118-2144, (2010) · Zbl 1202.62073
[7] T. Cai, H. Zhou, Minimax estimation of large covariance matrices under \(\ell_1\) norm, Technical Report, 2010. · Zbl 1266.62036
[8] Candes, E.; Tao, T., The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\), Annals of Statistics, 35, 2313-2351, (2007) · Zbl 1139.62019
[9] Cheung, V.; Spielman, R., The genetics of variation in gene expression, Nature Genetics, 522-525, (2002)
[10] El Karoui, N., Operator norm consistent estimation of large dimensional sparse covariance matrices, The Annals of Statistics, 36, 2717-2756, (2008) · Zbl 1196.62064
[11] Fan, J.; Feng, Y.; Wu, Y., Network exploration via the adaptive lasso and scad penalties, The Annals of Applied Statistics, 3, 521-541, (2009) · Zbl 1166.62040
[12] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9, 432-441, (2008) · Zbl 1143.62076
[13] Lam, C.; Fan, J., Sparsistency and rates of convergence in large covariance matrices estimation, The Annals of Statistics, 37, 4254-4278, (2009) · Zbl 1191.62101
[14] Li, H.; Gui, J., Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks, Biostatistics, 7, 302-317, (2006) · Zbl 1169.62378
[15] Meinshausen, N.; Bühlmann, P., High-dimensional graphs and variable selection with the lasso, Annals of Statistics, 34, (2006) · Zbl 1113.62082
[16] Peng, J.; Wang, P.; Zhou, N.; Zhu, J., Partial correlation estimation by joint sparse regression models, Journal of American Statistical Association, 104, 735-746, (2009) · Zbl 1388.62046
[17] Ravikumar, P.; Wainwright, M.; Raskutti, G.; Yu, B., High-dimensional covariance estimation by minimizing \(\ell_1\)-penalized log-determinant divergence, Electronic Journal of Statistics, 5, 935-980, (2011) · Zbl 1274.62190
[18] Rothman, A.; Levina, E.; Zhu, J., Sparse multivariate regression with covariate estimation, Journal of Computational and Graphical Statistics, 19, 4, 947-962, (2010)
[19] Stark, C.; Breitkreutz, B.; Chatr-Aryamontri, A.; Boucher, L.; Oughtred, R.; Livstone, M.; Nixon, J.; Van Auken, K.; Wang, X.; Shi, X.; Reguly, T.; Rust, J.; Winter, A.; Dolinski, K.; Tyers, M., The biogrid interaction database: 2011 update, Nucleic Acids Research, 39, D698-D704, (2011)
[20] Steffen, M.; Petti, A.; Aach, J.; D’Haeseleer, P.; Church, G., Automated modelling of signal transduction networks, BMC Bioinformatics, 3, 34, (2002)
[21] Wainwright, M. J., Sharp thresholds for noisy and high-dimensional recovery of sparsity using \(\ell_1\)-constrained quadratic programming (lasso), IEEE Transactions on Information Theory, 55, 2183-2202, (2009) · Zbl 1367.62220
[22] Yin, J.; Li, H., A sparse conditional Gaussian graphical model for analysis of genetical genomics data, Annals of Applied Statistics, 5, 2630-2650, (2011) · Zbl 1234.62151
[23] Zhao, P.; Yu, B., On model selection consistency of lasso, Journal of Machine Learning Research, 7, 2541-2567, (2006) · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.