×

High-dimensional consistency in score-based and hybrid structure learning. (English) Zbl 1411.62144

The main approaches for learning Bayesian networks can be classified as constrained-based, score-based or hybrid methods. Constrained-based methods, such as the PC algorithm and its variants, are the most popular applied methods in high-dimensional methods. This popularity is explained since score-based and hybrid methods lack consistency results in high-dimensional settings and/or they do not scale well to large graphs. In this paper, the authors propose a high-dimensional consistency of Greedy equivalence search (GES), which is a popular score-based method. Moreover, they propose new hybrid algorithms based on GES that are consistent in several sparse high-dimensional settings and scale well to large sparse graphs. In particular, the authors show that the consistency of hybrid algorithms based on GES can be achieved by imposing a restriction on the search space that changes depending on the current state of an algorithm. This new method is called adaptively greedy equivalence search (ARGES). The consistency of GES and ARGES is proved in several sparse high-dimensional settings (multivariate Gaussian, linear structural equation models with sub-Gaussian error variables, nonparanormal setting). A simulation study indicates that both GES and ARGES generally outperform the PC algorithm. It is worth to mention that all proofs and additional simulation results can be found in the Supplementary material, while an implementation of ARGES is available in the R-package pcalg.

MSC:

62H12 Estimation in multivariate analysis
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Alonso-Barba, J. I., delaOssa, L., Gámez, J. A. and Puerta, J. M. (2013). Scaling up the greedy equivalence search algorithm by constraining the search space of equivalence classes. Internat. J. Approx. Reason.54 429–451. · Zbl 1264.68123
[2] Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2012). High-dimensional Gaussian graphical model selection: Walk summability and local separation criterion. J. Mach. Learn. Res.13 2293–2337. · Zbl 1433.68322
[3] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist.25 505–541. · Zbl 0876.60095
[4] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res.9 485–516. · Zbl 1225.68149
[5] Cai, T., Liu, W. and Luo, X. (2011). A constrained \(ℓ_{1}\) minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc.106 594–607. · Zbl 1232.62087
[6] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika95 759–771. · Zbl 1437.62415
[7] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res.2 445–498. · Zbl 1007.68179
[8] Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res.3 507–554. · Zbl 1084.68519
[9] Chickering, D. M. and Meek, C. (2002). Finding optimal Bayesian networks. In UAI 2002.
[10] Chickering, D. M. and Meek, C. (2015). Selective greedy equivalence search: Finding optimal Bayesian networks using a polynomial number of score evaluations. In UAI 2015.
[11] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory14 462–467. · Zbl 0165.22305
[12] Colombo, D. and Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. J. Mach. Learn. Res.15 3741–3782. · Zbl 1312.68165
[13] de Campos, L. M. (1998). Independency relationships and learning algorithms for singly connected networks. J. Exp. Theor. Artif. Intell.10 511–549. · Zbl 1049.68621
[14] Doran, G., Muandet, K., Zhang, K. and Schölkopf, B. (2014). A permutation-based kernel conditional independence test. In UAI 2014.
[15] Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recogn. Lett.27 861–874.
[16] Foygel, R. and Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In NIPS 2010.
[17] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics9 432–441. · Zbl 1143.62076
[18] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33 1–22.
[19] Gao, B. and Cui, Y. (2015). Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics31 3953–3960.
[20] Ha, M. J., Sun, W. and Xie, J. (2016). PenPC: A two-step approach to estimate the skeletons of high-dimensional directed acyclic graphs. Biometrics72 146–155. · Zbl 1393.62065
[21] Harris, N. and Drton, M. (2013). PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res.14 3365–3383. · Zbl 1318.62197
[22] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. · Zbl 1273.62005
[23] Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res.13 2409–2464. · Zbl 1433.68346
[24] Huete, J. F. and de Campos, L. M. (1993). Learning causal polytrees. In ECSQARU 1993.
[25] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res.8 613–636. · Zbl 1222.68229
[26] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. and Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. J. Stat. Softw.47 1–26.
[27] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA. · Zbl 1183.68483
[28] Le, T. D., Liu, L., Tsykin, A., Goodall, G. J., Liu, B., Sun, B.-Y. and Li, J. (2013). Inferring microRNA-mRNA causal regulatory relationships from expression data. Bioinformatics29 765–771.
[29] Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist.40 2293–2326. · Zbl 1297.62073
[30] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist.37 3133–3164. · Zbl 1191.62118
[31] Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nat. Methods7 247–248.
[32] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In UAI 1995.
[33] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist.34 1436–1462. · Zbl 1113.62082
[34] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol.72 417–473.
[35] Nandy, P., Hauser, A. and Maathuis, M. H. (2018). Supplement to “High-dimensional consistency in score-based and hybrid structure learning.” DOI:10.1214/17-AOS1654SUPP.
[36] Nandy, P., Maathuis, M. H. and Richardson, T. S. (2017). Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. Ann. Statist.45 647–674. · Zbl 1426.62286
[37] Ouerd, M., Oommen, B. J. and Matwin, S. (2004). A formal approach to using data distributions for building causal polytree structures. Inform. Sci.168 111–132. · Zbl 1105.68092
[38] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291
[39] Perkovic, E., Textor, J., Kalisch, M. and Maathuis, M. (2015a). A complete adjustment criterion. In UAI 2015.
[40] Perkovic, E., Textor, J., Kalisch, M. and Maathuis, M. (2015b). Complete graphical characterization and construction of adjustment sets in markov equivalence classes of ancestral graphs. Preprint. Available at arXiv:1606.06903.
[41] Ramsey, J. D. (2015). Scaling up greedy equivalence search for continuous variables. Preprint. Available at arXiv:1507.07749.
[42] Ravikumar, P. K., Raskutti, G., Wainwright, M. J. and Yu, B. (2008). Model selection in Gaussian graphical models: High-dimensional consistency of \(ℓ_{1}\)-regularized MLE. In NIPS 2008.
[43] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing \(ℓ_{1}\)-penalized log-determinant divergence. Electron. J. Stat.5 935–980. · Zbl 1274.62190
[44] Rebane, G. and Pearl, J. (1987). The recovery of causal poly-trees from statistical data. In UAI 1987.
[45] Schmidberger, M., Lennert, S. and Mansmann, U. (2011). Conceptual aspects of large meta-analyses with publicly available microarray data: A case study in oncology. Bioinform. Biol. Insights5 13–39.
[46] Schmidt, M., Niculescu-Mizil, A. and Murphy, K. (2007). Learning graphical model structure using L1-regularization paths. In AAAI 2007.
[47] Schulte, O., Frigo, G., Greiner, R. and Khosravi, H. (2010). The IMAP hybrid method for learning Gaussian Bayes nets. In Canadian AI 2010.
[48] Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. J. Stat. Softw.35 1–22.
[49] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA. · Zbl 0806.62001
[50] Spirtes, P., Richardson, T., Meek, C., Scheines, R. and Glymour, C. (1998). Using path diagrams as a structural equation modeling tool. Sociol. Methods Res.27 182–225.
[51] Stekhoven, D. J., Moraes, I., Sveinbjörnsson, G., Henning, L., Maathuis, M. H. and Bühlmann, P. (2012). Causal stability ranking. Bioinformatics28 2819–2823.
[52] Tsamardinos, I., Brown, L. E. and Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn.65 31–78.
[53] Uhler, C., Raskutti, G., Bühlmann, P. and Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. Ann. Statist.41 436–463. · Zbl 1267.62068
[54] van de Geer, S. and Bühlmann, P. (2013). \(ℓ_{0}\)-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist.41 536–567. · Zbl 1267.62037
[55] Verdugo, R. A., Zeller, T., Rotival, M., Wild, P. S., Münzel, T., Lackner, K. J., Weidmann, H., Ninio, E., Trégouët, D.-A., Cambien, F., Blankenberg, S. and Tiret, L. (2013). Graphical modeling of gene expression in monocytes suggests molecular mechanisms explaining increased atherosclerosis in smokers. PLoS ONE8 e50888.
[56] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In UAI 1990.
[57] Zhang, K., Peters, J., Janzing, D. and Schölkopf, B. (2011). Kernel-based conditional independence test and application in causal discovery. In UAI 2011.
[58] Zhao, T., Liu, H., Roeder, K., Lafferty, J. and Wasserman, L. (2012). The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res.13 1059–1062. · Zbl 1283.68311
[59] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc.101 1418–1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.