×

Estimation of sparse directed acyclic graphs for multivariate counts data. (English) Zbl 1390.62263

Summary: The next-generation sequencing data, called high-throughput sequencing data, are recorded as count data, which are generally far from normal distribution. Under the assumption that the count data follow the Poisson log-normal distribution, this article provides an \(L_1\) -penalized likelihood framework and an efficient search algorithm to estimate the structure of sparse directed acyclic graphs (DAGs) for multivariate counts data. In searching for the solution, we use iterative optimization procedures to estimate the adjacency matrix and the variance matrix of the latent variables. The simulation result shows that our proposed method outperforms the approach which assumes multivariate normal distributions, and the log-transformation approach. It also shows that the proposed method outperforms the rank-based PC method under sparse network or hub network structures. As a real data example, we demonstrate the efficiency of the proposed method in estimating the gene regulatory networks of the ovarian cancer study.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H12 Estimation in multivariate analysis

Software:

LBFGS-B; DEseq; Reactome
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Aitchison, The multivariate Poisson-log normal distribution, Biometrika 76 pp 643– (1989) · Zbl 0679.62040
[2] Allen, A local Poisson graphical model for inferring networks from sequencing data, IEEE Transactions on Nanobioscience 12 pp 189– (2013)
[3] Anders, Differential expression analysis for sequence count data, Genome Biology 11 pp R106– (2010)
[4] Bolton, Role of common genetic variants in ovarian cancer susceptibility and outcome: Progress to date from the ovarian cancer association consortium(OCAC), The Journal of Internal Medicine 271 pp 366– (2012)
[5] Byrd, A limited memory algorithm for bound constrained optimization, SIAM Journal on Scientific Computing 16 pp 1190– (1995) · Zbl 0836.65080
[6] The Cancer Genome Atlas Research Network, Integrated genomic analyses of ovarian carcinoma, Nature 474 pp 609– (2011)
[7] Chickering , D. M. 1995
[8] Chickering, Learning equivalence classes of Bayesian-network structures, Journal of Machine Learning Research 2 pp 445– (2002) · Zbl 1007.68179
[9] Choi , Y. Coram , M. Candille , S. Wu , L. Snyder , M. Tang , H. 2013 Constructing biological network using high-throughput data
[10] Daly, Learning Bayesian networks: Approaches and issues, The Knowledge Engineering Review 26 pp 99– (2011)
[11] Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 pp 432– (2008) · Zbl 1143.62076
[12] Fu, Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent, Journal of the American Statistical Association 108 pp 288– (2013) · Zbl 06158343
[13] Han, Estimation of Directed Acyclic Graphs Through Two-stage Adaptive Lasso for Gene Network Inference, Journal of the American Statistical Association (2016)
[14] Harris, PC algorithm for nonparanormal graphical models, Journal of Machine Learning Research 14 pp 3365– (2013) · Zbl 1318.62197
[15] Hastie , T. Tibshirani , R. Friedman , J. 2009
[16] Keshava Prasad, Human protein reference database-2009 update, Nucleic Acids Research 37 pp 767– (2009) · Zbl 05746624
[17] Joshi-Tope, Reactome: A knowledgebase of biological pathways, Nucleic Acids Research 33 pp 428– (2005) · Zbl 05437316
[18] Lauritzen , S. L. 1996
[19] Marioni, RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays, Genome Research 18 pp 1509– (2008)
[20] Margolin , A. A. Nemenman , I. Basso , K. Wiggins , C. Stolovitzky , G. Favera , R. D. 2006 7 S7
[21] Matthews, Reactome knowledgebase of human biological pathways and processes, Nucleic Acids Research 37 pp 619– (2009) · Zbl 05746600
[22] Meinshausen, High-dimensional graphs and variable selection with the Lasso, The Annals of Statistics 34 pp 1436– (2006) · Zbl 1113.62082
[23] Neapolitan , R. E. 2004
[24] Neal , R. Hinton , G. 1999
[25] Pearl , J. 2000
[26] Rothman, Sparse permutation invariant covariance estimation, Electronic Journal of Statistics 2 pp 494– (2008) · Zbl 1320.62135
[27] Schaefer, PID: The pathway interaction database, Nucleic Acids Research 37 pp 674– (2009) · Zbl 05746610
[28] Shojaie, Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs, Biometrika 97 pp 519– (2010) · Zbl 1195.62090
[29] Spirtes , P. Glymour , C. Scheines , R. 2000
[30] Srivastava, A two-parameter generalized poisson model to improve the analysis of rna-seq data, Nucleic Acids Research 38 pp e170– (2010)
[31] Yuan, Model selection and estimation in the Gaussian graphical model, Biometrika 94 pp 19– (2007) · Zbl 1142.62408
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.