×

Edge selection for undirected graphs. (English) Zbl 07192718

Summary: This article explores an ‘Edge Selection’ procedure to fit an undirected graph to a given data set. Undirected graphs are routinely used to represent, model and analyse associative relationships among the entities on a social, biological or genetic network. Our proposed method combines the computational efficiency of least angle regression and at the same time ensures symmetry of the selected adjacency matrix. Various local and global properties of the edge selection path are explored analytically. In particular, a suitable parameter that controls the amount of shrinkage is identified and we consider several cross-validation techniques to choose an accurate predictive model on the path. The proposed method is illustrated with a detailed simulation study involving models with various levels of sparsity and variability in the nodal degree distributions. Finally, our method is used to select undirected graphs from various real data sets. We employ it for identifying the regulatory network of isoprenoid pathways from a gene-expression data and also to identify genetic network from a high-dimensional breast cancer study data.

MSC:

62-XX Statistics

Software:

ES; hglasso; glasso
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Yuan M, Lin Y.Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol. 2006; 68(1):49-67. doi: 10.1111/j.1467-9868.2005.00532.x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1141.62030
[2] Dempster AP.Covariance selection. Biometrics. 1972;28(1):157-175. doi: 10.2307/2528966[Crossref], [Web of Science ®], [Google Scholar]
[3] Besag J.Spatial interaction and the statistical analysis of lattice systems. J Roy Statist Soc Ser B. 1974;36:192-236. [Google Scholar] · Zbl 0327.60067
[4] Freeman LC.Centrality in social networks conceptual clarification. Soc Netw. 1978;1:215. doi: 10.1016/0378-8733(78)90021-7[Crossref], [Web of Science ®], [Google Scholar]
[5] Rodriguez-Concepcion M, Boronat A.Elucidation of the methylerythritol phosphate pathway for isoprenoid biosynthesis in bacteria and plastids. A metabolic milestone achieved through genomics. Plant Physiol. 2002;130(3):1079-1089. doi: 10.1104/pp.007138[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[6] Wille A, Zimmermann P, Vranova A, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in arabidopsis thaliana. Genome Biol. 2004;5(11):R92-00. doi: 10.1186/gb-2004-5-11-r92[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[7] Lauritzen SL. Graphical models. Oxford: Oxford University Press; 1996. [Google Scholar] · Zbl 0907.62001
[8] Drton M, Perlman M.Model selection for Gaussian concentration graphs. Biometrika. 2004;91(3):591-602. doi: 10.1093/biomet/91.3.591[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1108.62098
[9] Šidák Z.Rectangular confidence regions for the means of multivariate normal distributions. J Amer Statist Assoc. 1967;62:626-633. [Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 0158.17705
[10] Holm S.A simple sequentially rejective multiple test procedure. Scand J Statist. 1979;6(2): 65-70. [Web of Science ®], [Google Scholar] · Zbl 0402.62058
[11] Drton M, Perlman MD.A SINful approach to Gaussian graphical model selection. J Statist Plann Inference. 2008;138(4):1179-1200. doi: 10.1016/j.jspi.2007.05.035[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1130.62068
[12] Liu W.Gaussian graphical model estimation with false discovery rate control. Ann Statist. 2013;41(6):2948-2978. doi: 10.1214/13-AOS1169[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1288.62094
[13] Tibshirani R.Regression shrinkage and selection via the LASSO. J Roy Statist Soc Ser B. 1996;58(1):267-288. [Google Scholar] · Zbl 0850.62538
[14] Fan J, Li R.Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96(456):1348-1360. doi: 10.1198/016214501753382273[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1073.62547
[15] Banerjee O, El Ghaoui L, d’Aspremont A.Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res. 2008;9:485-516. [Web of Science ®], [Google Scholar] · Zbl 1225.68149
[16] d’Aspremont A, Banerjee O, ElGhaoui L.First-order methods for sparse covariance selection. SIAM J Matrix Anal Appl. 2008;30(1):56-66. doi: 10.1137/060670985[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1156.90423
[17] Efron B, Hastie T, Johnstone I, et al. Least angle regression. Ann Statist. 2004;32(2): 407-499. doi: 10.1214/009053604000000067[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1091.62054
[18] Friedman J, Hastie T, Höfling H, et al. Pathwise coordinate optimization. Ann Appl Stat. 2007;1(2):302-332. doi: 10.1214/07-AOAS131[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1378.90064
[19] Osborne MR, Presnell B, Turlach BA.A new approach to variable selection in least squares problems. IMA J Numer Anal. 2000;20(3):389-403. doi: 10.1093/imanum/20.3.389[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0962.65036
[20] Yuan M, Lin Y.Model selection and estimation in the Gaussian graphical model. Biometrika. 2007;94(1):19-35. doi: 10.1093/biomet/asm018[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1142.62408
[21] Friedman J, Hastie T, Tibshirani R.Sparse inverse covariance estimation with the graphical LASSO. Biostatistics. 2008;9(3):432-441. doi: 10.1093/biostatistics/kxm045[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1143.62076
[22] Danaher P, Wang P, Witten DM.The joint graphical lasso for inverse covariance estimation across multiple classes. J Roy Statist Soc Ser B. 2013;76(2):373-397. doi: 10.1111/rssb.12033[Crossref], [Google Scholar] · Zbl 07555455
[23] Tan KM, London P, Mohan K, et al. Learning graphical models with hubs. J Mach Learn Res. 2014;15:3297-3331. [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1318.68155
[24] Vujačić I, Abbruzzo A, Wit E.A computationally fast alternative to cross-validation in penalized Gaussian graphical models. J Statis Comp Simul. 2015;85(18):3628-3630. doi: 10.1080/00949655.2014.992020[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1510.62077
[25] Stuart A, Ord K, Arnold S. Kendall’s advanced theory of statistics, classical inference and the linear model. New York: Wiley; 2009. [Google Scholar]
[26] Meinshausen N, Bülmann P.High dimensional graphs and variable selection with the LASSO. Ann Stat. 2006;34:1436-1462. doi: 10.1214/009053606000000281[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1113.62082
[27] Shimamura T, Imoto S, Yamaguchi R, et al. Weighted lasso in graphical Gaussian modeling for large gene network estimation based on microarray data. Genome Inform. 2007;19(GIW 2007):142-153. [PubMed], [Google Scholar]
[28] Zhou S, Rütimann P, Xu M, et al. High-dimensional covariance estimation based on Gaussian graphical models. J Mach Learn Res. 2011;12:2975-3026. [Web of Science ®], [Google Scholar] · Zbl 1280.62065
[29] Peng J, Wang P, Zhou N, et al. Partial correlation estimation by joint sparse regression models. J Am Stat Assoc. 2009;104(486):735-746. doi: 10.1198/jasa.2009.0126[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1388.62046
[30] Friedman J, Hastie T, Tibshirani R. Applications of the LASSO and grouped lasso to the estimation of sparse graphical models. Technical report. 2010. [Google Scholar]
[31] Rocha GV, Zhao P, Yu B. A path following algorithm for sparse pseudo-likelihood inverse covariance estimation (SPLICE). Technical report. Berkeley (CA): Statistics Department, UC Berkeley. 2008. [Google Scholar]
[32] Khare K, Oh S-Y, Rajaratnam BA convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees. J Roy Statist Soc Ser B. 2015; 77(4): 803-824. [Google Scholar] · Zbl 1414.62183
[33] Zheng L, Jun X,. 1. Group variable selection for data with dependent structures. J Statis Comp Simul. 2012;82(1):95-106. doi: 10.1080/00949655.2010.529812[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1431.62296
[34] Huang JZ, Liu N, Pourahmadi M, et al. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika. 2006;93(1):85-98. doi: 10.1093/biomet/93.1.85[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1152.62346
[35] Breiman L, Friedman J, Olshen R, et al. Classification and regression trees. Monterey (CA): Wadsworth and Brooks; 1984. [Google Scholar] · Zbl 0541.62042
[36] Speed TP, Kiiveri HT.Gaussian markov distributions over finite graphs. Ann Statist. 1986;14(1):138-150. doi: 10.1214/aos/1176349846[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0589.62033
[37] Whittaker J. Graphical models in applied multivariate statistics. New York: Wiley; 1990. [Google Scholar] · Zbl 0732.62056
[38] Shojaie A, Michailidis G.Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika. 2010;97(3):519-538. doi: 10.1093/biomet/asq038[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1195.62090
[39] Holton DA, Sheehan J. The Petersen graph. Cambridge: Cambridge University Press; 1993. [Crossref], [Google Scholar] · Zbl 0781.05001
[40] Payne SE. Finite generalized quadrangles: a survey. In: Proceedings of the International Conference on Projective Planes; 1973; Pullman: Washington State University; 1973. p. 219-261. [Google Scholar] · Zbl 0274.05002
[41] Ravikumar P, Wainwright MJ, Raskutti G, etal. High-dimensional covariance estimation by minimizing ##img####img####img##ℓ1-penalized log-determinant divergence. Electron J Stat. 2011;5:935-980. doi: 10.1214/11-EJS631[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1274.62190
[42] Mardia KV, Kent JT, Bibby JM. Multivariate analysis. London: Academic Press; 1979. [Google Scholar] · Zbl 0432.62029
[43] Laule O, Fürholz A, Chang H-S, et al. Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in arabidopsis thaliana. Proc Natl Acad Sci USA. 2003;100(11):6866-71. doi: 10.1073/pnas.1031755100[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[44] Rodriguez-Concepcion M, Fores O, Martinez-Garcia J, et al. Distinct light-mediated pathways regulate the biosynthesis and exchange of isoprenoid precursors during arabidopsis seedling development. Plant Cell. 2004;16(1):144-56. doi: 10.1105/tpc.016204[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[45] Sang-Yun Oh Kshitij Khare BR. Concord method for Graphical Model Selection. R package version 0.41. [Google Scholar] · Zbl 1414.62183
[46] Ong MHV, Chaudhuri S. Edge Selection. R package version 1.0; 2013. [Google Scholar]
[47] Apostol TM. Mathematical analysis. New Delhi: Narosa Publishing House; 1997. [Google Scholar]
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.