×

Fused multiple graphical lasso. (English) Zbl 1320.90055

Summary: In this paper, we consider the problem of estimating multiple graphical models simultaneously using the fused lasso penalty, which encourages adjacent graphs to share similar structures. A motivating example is the analysis of brain networks of Alzheimer’s disease using neuroimaging data. Specifically, we may wish to estimate a brain network for the normal controls (NC), a brain network for the patients with mild cognitive impairment (MCI), and a brain network for Alzheimer’s patients (AD). We expect the two brain networks for NC and MCI to share common structures but not to be identical to each other; similarly for the two brain networks for MCI and AD. The proposed formulation can be solved using a second-order method. Our key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which decomposes the large graphs into small subgraphs and allows an efficient estimation of multiple independent (small) subgraphs, dramatically reducing the computational cost. We perform experiments on both synthetic and real data; our results demonstrate the effectiveness and efficiency of the proposed approach.

MSC:

90C22 Semidefinite programming
90C25 Convex programming
90C47 Minimax problems in mathematical programming
65K05 Numerical mathematical programming methods
62J10 Analysis of variance and covariance (ANOVA)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] N. P. Azari, S. I. Rapoport, C. L. Grady, M. B. Schapiro, J. A. Salerno, A. Gonzales-Aviles, and B. Horwitz, {\it Patterns of interregional correlations of cerebral glucose metabolic rates in patients with dementia of the Alzheimer type}, Neurodegeneration, 1 (1992), pp. 101-111.
[2] O. Banerjee, L. El Ghaoui, and A. d’Aspremont, {\it Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data}, J. Mach. Learning Res., 9 (2008), pp. 485-516. · Zbl 1225.68149
[3] D. Bertsekas and J. N. Tsitsiklis, {\it Introduction to Linear Optimization}, Athena Scientific, Nashua, NH, 1997.
[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, {\it Distributed optimization and statistical learning via the alternating direction method of multipliers}, Found. Trends Mach. Learning, 3 (2011), pp. 1-122. · Zbl 1229.90122
[5] R. H. Byrd, J. Nocedal, and F. Oztoprak, {\it An inexact successive quadratic approximation method for convex L1 regularized optimization}, arXiv:1309.3529, 2013. · Zbl 1342.49037
[6] H. Y. Chuang, E. Lee, Y. T. Liu, D. Lee, and T. Ideker, {\it Network-based classification of breast cancer metastasis}, Mol. Syst. Biol., 3 (2007), doi: 10.1038/msb4100180.
[7] L. Condat, {\it A direct algorithm for 1d total variation denoising}, Signal Process. Lett., 20 (2013), pp. 1054-1057.
[8] P. Danaher, P. Wang, and D. M. Witten, {\it The joint graphical lasso for inverse covariance estimation across multiple classes}, J. Roy. Statist. Soc. Ser. B, 76 (2014), pp. 373-397. · Zbl 07555455
[9] A. d’Aspremont, O. Banerjee, and L. El Ghaoui, {\it First-order methods for sparse covariance selection}, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 56-66. · Zbl 1156.90423
[10] Q. Dinh, A. Kyrillidis, and V. Cevher, {\it A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions}, in Proceedings of the 30th International Conference on Machine Learning (ICML 2013), 2013, pp. 271-279.
[11] J. Friedman, T. Hastie, and R. Tibshirani, {\it Sparse inverse covariance estimation with the graphical lasso}, Biostatistics, 9 (2008), pp. 432-441. · Zbl 1143.62076
[12] J. Guo, E. Levina, G. Michailidis, and J. Zhu, {\it Joint estimation of multiple graphical models}, Biometrika, 98 (2011), pp. 1-15. · Zbl 1214.62058
[13] S. Hara and T. Washio, {\it Common substructure learning of multiple graphical Gaussian models}, MLKDD, (2011), pp. 1-16. · Zbl 1328.68165
[14] J. Honorio and D. Samaras, {\it Multi-task learning of Gaussian graphical models}, in Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, 2010, pp. 447-454.
[15] B. Horwitz, C. L. Grady, N. L. Schlageter, R. Duara, and S. I. Rapoport, {\it Intercorrelations of regional cerebral glucose metabolic rates in Alzheimer’s disease}, Brain Res., 407 (1987), pp. 294-306.
[16] C. J. Hsieh, I. Dhillon, P. Ravikumar, and A. Banerjee, {\it A divide-and-conquer method for sparse inverse covariance estimation}, in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, 2012, pp. 2339-2347.
[17] C. J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar, {\it Sparse inverse covariance matrix estimation using quadratic approximation}, in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS), Grenada, 2011, pp. 2330-2338.
[18] S. Huang, J. Li, L. Sun, J. Liu, T. Wu, K. Chen, A. Fleisher, E. Reiman, and J. Ye, {\it Learning brain connectivity of Alzheimer’s disease from neuroimaging data}, in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), 2009, pp. 808-816.
[19] K. Jiang, D. Sun, and K.-C. Toh, {\it An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP}, SIAM J. Optim., 22 (2012), pp. 1042-1064. · Zbl 1401.90120
[20] T. Joachims, {\it Making large-scale support vector machine learning practical}, in Advances in Kernel Methods, MIT Press, Cambridge, MA, 1999, pp. 169-184.
[21] M. Kolar, L. Song, A. Ahmed, and E. P. Xing, {\it Estimating time-varying networks}, Ann. Appl. Statist., 4 (2010), pp. 94-123. · Zbl 1189.62142
[22] M. Kolar and E. P. Xing, {\it On time varying undirected graphs}, in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTAT), Ft. Lauderdale, 2011, pp. 407-415.
[23] J. D. Lee, Y. Sun, and M. A. Saunders, {\it Proximal Newton-type methods for minimizing composite functions}, SIAM J. Optim., 24 (2014), pp. 1420-1443. · Zbl 1306.65213
[24] L. Li and K. C. Toh, {\it An inexact interior point method for \(l1\)-regularized sparse covariance selection}, Math, Program, Comput., 2 (2010), pp. 291-315. · Zbl 1208.90131
[25] H. Liu, K. Roeder, and L. Wasserman, {\it Stability approach to regularization selection (StARS) for high dimensional graphical models}, in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS), Grenada, 2011, pp. 1432-1440.
[26] J. Liu and J. Ye, {\it Moreau-Yosida regularization for grouped tree structure learning}, in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2010, pp. 1459-1467.
[27] J. Liu, L. Yuan, and J. Ye, {\it An efficient algorithm for a class of fused lasso problems}, in Proceedings of the 16th ACM SIGDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2010, pp. 323-332.
[28] Z. Lu, {\it Smooth optimization approach for sparse covariance selection}, SIAM J. Optim., 19 (2009), pp. 1807-1827. · Zbl 1179.90257
[29] Z. Lu, {\it Adaptive first-order methods for general sparse inverse covariance selection}, SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2000-2016. · Zbl 1206.90115
[30] Z. Lu and Y. Zhang, {\it An augmented Lagrangian approach for sparse principal component analysis}, Math. Program., 135 (2012), pp. 149-193. · Zbl 1263.90093
[31] R. Mazumder and T. Hastie, {\it Exact covariance thresholding into connected components for large-scale graphical lasso}, J. Mach. Learning Res., 13 (2012), pp. 781-794. · Zbl 1283.62148
[32] R. Mazumder and T. Hastie, {\it The graphical lasso: New insights and alternatives}, Electron. J. Statist., 6 (2012), pp. 2125-2149. · Zbl 1295.62066
[33] M. P. Milham, {\it The ADHD}-200 {\it Sample}, fMRI dataset, available online from http://fcon_1000.projects.nitrc.org/indi/adhd200/.
[34] N. Meinshausen and P. Bühlmann, {\it High-dimensional graphs and variable selection with the lasso}, Ann. Statist., 34 (2006), pp. 1436-1462. · Zbl 1113.62082
[35] K. Mohan, M. Chung, S. Han, D. Witten, S. Lee, and M. Fazel, {\it Structured learning of Gaussian graphical models}, in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, 2012, pp. 629-637.
[36] Y. Nesterov, {\it Smooth minimization of non-smooth functions}, Math. Program., 103 (2005), pp. 127-152. · Zbl 1079.90102
[37] C. Oberlin and S. J. Wright, {\it Active set identification in nonlinear programming}, SIAM J. Optim., 17 (2006), pp. 577-605. · Zbl 1174.90813
[38] P. A. Olsen, F. Oztoprak, J. Nocedal, and S. J. Rennie, {\it Newton-like methods for sparse inverse covariance estimation}, in Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, 2012, pp. 755-763.
[39] K. Scheinberg, S. Ma, and D. Goldfarb, {\it Sparse inverse covariance selection via alternating linearization methods}, in Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2010, pp. 2102-2109.
[40] K. Scheinberg and I. Rish, {\it Sinco–A Greedy Coordinate Ascent Method for Sparse Inverse Covariance Selection Problem}, Technical Report IBM RC24837, 2009.
[41] K. Scheinberg and X. Tang, {\it Practical inexact proximal quasi-Newton method with global complexity analysis}, arXiv:1311.6547v3, 2014. · Zbl 1366.90166
[42] R. Tibshirani, {\it Regression shrinkage and selection via the lasso}, J. Roy. Statist. Soc. Ser. B, 58 (1996), pp. 267-288. · Zbl 0850.62538
[43] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, {\it Sparsity and smoothness via the fused lasso}, J. Roy. Statist. Soc. Ser. B, 67 (2005), pp. 91-108. · Zbl 1060.62049
[44] P. Tseng and S. Yun, {\it A coordinate gradient descent method for nonsmooth separable minimization}, Math. Program., 117 (2009), pp. 387-423. · Zbl 1166.90016
[45] N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou, F. Crivello, O. Etard, N. Delcroix, B. Mazoyer, and M. Joliot, {\it Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain}, Neuroimage, 15 (2002), pp. 273-289.
[46] C. Wang, D. Sun, and K.-C. Toh, {\it Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm}, SIAM J. Optim., 20 (2010), pp. 2994-3013. · Zbl 1211.90130
[47] K. Wang, M. Liang, L. Wang, L. Tian, X. Zhang, K. Li, and T. Jiang, {\it Altered functional connectivity in early Alzheimer’s disease: A resting-state fMRI study}, Human Brain Mapping, 28 (2007), pp. 967-978.
[48] M. W. Weinter et al., {\it Alzheimer’s Disease Neuroimaging Initiative}, joint project and data-sharing site, http://adni.loni.ucla.edu/.
[49] D. M. Witten, J. H. Friedman, and N. Simon, {\it New insights and faster computations for the graphical lasso}, J. Comput. Graph. Statist., 20 (2011), pp. 892-900.
[50] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, {\it Sparse reconstruction by separable approximation}, IEEE Trans. Signal Process., 57 (2009), pp. 2479-2493. · Zbl 1391.94442
[51] S. Yang, L. Yuan, Y. C. Lai, X. Shen, P. Wonka, and J. Ye, {\it Feature grouping and selection over an undirected graph}, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2012, pp. 922-930.
[52] G. X. Yuan, C. H. Ho, and C. J. Lin, {\it An improved GLMNET for \(l1\)-regularized logistic regression}, J. Mach. Learning Res., 13 (2012), pp. 1999-2030. · Zbl 1432.68404
[53] M. Yuan and Y. Lin, {\it Model selection and estimation in regression with grouped variables}, J. Roy. Statist. Soc. Ser. B , 68 (2006), pp. 49-67. · Zbl 1141.62030
[54] M. Yuan and Y. Lin, {\it Model selection and estimation in the Gaussian graphical model}, Biometrika, 94 (2007), pp. 19-35. · Zbl 1142.62408
[55] X. Yuan, {\it Alternating direction method for covariance selection models}, J. Sci. Comput., 51 (2012), pp. 261-273. · Zbl 1255.65031
[56] S. Zhou, J. Lafferty, and L. Wasserman, {\it Time varying undirected graphs}, in Proceedings of the 21st Annual Conference on Learning Theory (COLT), Helsinki, 2008, pp. 295-319. · Zbl 1475.62174
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.