×

Sparse covariance matrix estimation by DCA-based algorithms. (English) Zbl 1418.62276

Summary: This letter proposes a novel approach using the \(\ell_0\)-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the \(\ell_0\) term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of \(\ell_0\)-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods.

MSC:

62J10 Analysis of variance and covariance (ANOVA)
90C26 Nonconvex programming, global optimization
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Banerjee, O., El Ghaoui, L., & d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J. Mach. Learn. Res., 9, 485-516. · Zbl 1225.68149
[2] Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci., 2, 183-202. , · Zbl 1175.94009
[3] Bhatia, R. (1997). Matrix analysis. New York: Springer-Verlag. , · Zbl 0863.15001
[4] Bien, J., & Tibshirani, R. (2011). Sparse estimation of a covariance matrix. Biometrika, 98(4), 807-820. , · Zbl 1228.62063
[5] Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundat. Trends Mach. Learn., 3(1), 1-122. · Zbl 1229.90122
[6] Boyd, S., & Vanderberghe, L. (1979). Convex optimization. Cambridge: Cambridge University Press.
[7] Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 82-90). San Francisco: Morgan Kaufmann.
[8] Cai, T., Liu, W., & Luo, X. (2011). A constrained ##img## minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 594-607. , · Zbl 1232.62087
[9] Chambolle, A., Devore, R. A., Lee, N. Y., & Lucier, B. J. (1998). Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process, 7, 319-335. , · Zbl 0993.94507
[10] Chaudhuri, S., Drton, M., & Richardson, T. S. (2007). Estimation of a covariance matrix with zeros. Biometrika, 94, 199-216. , · Zbl 1143.62032
[11] Collobert, R., Sinz, F., Weston, J., & Bottou, L. (2006). Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (pp. 201-208). New York: ACM Press. ,
[12] Danaher, P., Wang, P., & Witten, D. M. (2014). The joint graphical Lasso for inverse covariance estimation across multiple classes. J. R. Statist. Soc. B, 76, 373-397. , · Zbl 07555455
[13] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Series B: Stat. Methodol., 39, 1-38. · Zbl 0364.62022
[14] Deng, X., & Tsui, K. W. (2013). Penalized covariance matrix estimation using a matrix-logarithm transformation. Journal of Computational and Graphical Statistics, 22(2), 494-512. ,
[15] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G., & West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multiv. Anal., 90, 196-212. , · Zbl 1047.62104
[16] Efron, B. (2010). Correlated ##img##-values and the accuracy of large-scale statistical estimates. J. Am. Statist. Asso., 105, 1042-1055. , · Zbl 1390.62139
[17] Fan, J., Liao, Y., & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Statist. Soc. B, 74, 603-680. , · Zbl 1411.62138
[18] Friedman, J., Hastie, T. J., & Tibshirani, R. J. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. , · Zbl 1143.62076
[19] Guo, Y., Hastie, T., & Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 8(1), 86-100. , · Zbl 1170.62382
[20] Jagannathan, R., & Ma, T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance, 58, 1651-1684. ,
[21] Kourtis, A., Dotsis, G., & Markellos, R. N. (2012). Parameter uncertainty in portfolio selection: Shrinking the inverse covariance matrix. Journal of Banking and Finance, 36, 2522-2531. ,
[22] Krause, N., & Singer, Y. (2004). Leveraging the margin more carefully. In Proceedings of the Twenty First International Conference on Machine Learning. New York: ACM Press. ,
[23] Lai, T. L., Xing, H., & Chen, Z. (2011). Mean-variance portfolio optimization when means and covariances are unknown. Annals of Applied Statistics, 5, 798-823. , · Zbl 1454.62303
[24] Lam, C., & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Annals of Statistics, 37, 4254-4278. , · Zbl 1191.62101
[25] Le Hoai, M., Le Thi, H. A., Pham Dinh, T., & Huynh, V. N. (2013). Block clustering based on difference of convex functions (DC) programming and DC algorithms. Neural Computation, 25, 259-278. · Zbl 1418.62253
[26] Le Thi, H. A., Le Hoai, M., Nguyen, V. V., & Pham Dinh, T. (2008). A DC programming approach for feature selection in support vector machines learning. Journal of Advances in Data Analysis and Classification, 2(3), 259-278. , · Zbl 1284.90057
[27] Le Thi, H. A., Le Hoai, M., & Pham Dinh, T. (2007). Optimization based DC programming and DCA for hierarchical clustering. European Journal of Operational Research, 183, 1067-1085. , · Zbl 1149.90117
[28] Le Thi, H. A., Le Hoai, M., & Pham Dinh, T. (2014). New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recognition, 47, 388-401. , · Zbl 1326.68225
[29] Le Thi, H. A., Le Hoai, M., & Pham Dinh, T. (2015). Feature selection in machine learning: An exact penalty approaching a difference of convex function algorithm. Machine Learning, 101, 163-186. , · Zbl 1343.68201
[30] Le Thi, H. A., & Pham Dinh, T. (2005). The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research, 133, 23-46. , · Zbl 1116.90122
[31] Le Thi, H. A., Pham Dinh, T., & Huynh, V. N. (2012). Exact penalty and error bounds in DC programming. Journal of Global Optimization, 52(3), 509-535. , · Zbl 1242.49037
[32] Le Thi, H. A., Pham Dinh, T., Le Hoai, M., & Vo Xuan, T. (2015). DC approximation approaches for sparse optimization. European Journal of Operational Research, 244, 26-44. , · Zbl 1346.90819
[33] Le Thi, H. A., Vo Xuan, T., & Pham Dinh, T. (2014). Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms. Neural Networks, 59, 36-50. , · Zbl 1327.90236
[34] Ledoit, O., & Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance, 10, 603-621. ,
[35] Ledoit, O., & Wolf, M. (2004). Honey, I shrunk the sample covariance matrix. J. Portfolio Management, 30, 110-119. ,
[36] Leek, J., & Storey, J. (2008). A general framework for multiple testing dependence. Proc. Nat. Acad. Sci. USA, 105, 18718-18723. , · Zbl 1359.62202
[37] Liu, H., Wang, L., & Zhao, T. (2014). Sparse covariance matrix estimation with eigenvalue contraints. Journal of Computational and Graphical Statistics, 23(2), 439-459. ,
[38] Liu, Y., Shen, X., & Doss, H. (2005). Multicategory ##img##-learning and support vector machine: Computational tools. Journal of Computational and Graphical Statistics, 14, 219-236. ,
[39] Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. New York: Academic Press. · Zbl 0432.62029
[40] Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 77-91. ,
[41] Meinshausen, N., & Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist., 34, 1436-1462. , · Zbl 1113.62082
[42] Ong, C. S., & Le Thi, H. A. (2013). Learning sparse classifers with difference of convex functions algorithms. Optimization Methods and Software, 28(4), 830-854. , · Zbl 1282.90181
[43] Peleg, D., & Meir, R. (2008). A bilinear formulation for vector sparsity optimization. Signal Processing, 8(2), 375-389. , · Zbl 1186.94273
[44] Pham Dinh, T., & Le Thi, H. A. (1997). Convex analysis approach to D.C. programming: Theory, algorithms and applications. Acta Mathematica Vietnamica, 22(1), 289-355. · Zbl 0895.90152
[45] Pham Dinh, T., & Le Thi, H. A. (1998). A DC optimization algorithm for solving the trust-region subproblem. SIAM Journal of Optimization, 8(2), 476-505. , · Zbl 0913.65054
[46] Pham Dinh, T., & Le Thi, H. A. (2014). Recent advances in DC programming and DCA. In Lecture Notes in Computer Science: Vol. 8342, Transactions on Computational Collective Intelligence (pp. 1-37). Berlin: Springer-Verlag. , · Zbl 1283.68038
[47] Rothman, A. (2012). Positive definite estimators of large covariance matrices. Biometrika, 99, 733-740. , · Zbl 1437.62595
[48] Rothman, A., Levina, E., & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Statist., 2, 494-515. , · Zbl 1320.62135
[49] Rothman, A. J., Levina, E., & Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Am. Statist. Assoc., 104, 177-186. , · Zbl 1388.62170
[50] Schafer, J., & Strimmer, K. (2005a). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics, 21, 754-764. ,
[51] Schafer, J., & Strimmer, K. (2005b). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, 1-30. ,
[52] Sentana, E. (2009). The econometrics of mean-variance efficiency tests: A survey. Econometr. J., 12, 65-101. , · Zbl 1178.91232
[53] Sun, J., & Zhao, H. (2015). The application of sparse estimation of covariance matrix to quadratic discriminant analysis. BMC Bioinformatics, 16, 1-9. ,
[54] Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2004). Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science, 18(1), 104-117. , · Zbl 1048.62109
[55] Toh, H., & Horimoto, K. (2002). Inference of a genetic network by a combined approach of cluster analysis and graphical gaussian modeling. Bioinformatics, 18, 287-297. ,
[56] Witten, D. M., & Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. Journal Royal Statistical Society B, 73, 753-772. , · Zbl 1228.62079
[57] Xiong, H., Goulding, E., Carlson, E. J., Tecott, L., McCulloch, C. E., & Sen, S. (2011). A flexible estimating equations approach for mapping function-valued traits. Genetics, 189, 305-316. ,
[58] Xue, L., Ma, S., & Zuo, H. (2012). Positive-definite ##img##-penalized estimation of large covariance matrices. Journal of the American Statistical Association, 107, 1480-1491. , · Zbl 1258.62063
[59] Yap, J. S., Fan, J., & Wu, R. (2009). Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics, 65, 1068-1077. , · Zbl 1181.62186
[60] Yuan, M., & Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 94, 19-35. , · Zbl 1142.62408
[61] Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure (CCCP). Neural Comput., 15, 915-936. , · Zbl 1022.68112
[62] Zhang, T., & Zou, H. (2014). Sparse precision matrix estimation via Lasso penalized d-trace loss. Biometrika, 101, 103-120. , · Zbl 1285.62063
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.