×

Compatible priors for model selection of high-dimensional Gaussian DAGs. (English) Zbl 1455.62062

Summary: Graphical models represent a powerful framework to incorporate conditional independence structure for the statistical analysis of high-dimensional data. In this paper we focus on Directed Acyclic Graphs (DAGs). In the Gaussian setting, a prior recently introduced for the parameters associated to the (modified) Cholesky decomposition of the precision matrix is the DAG-Wishart. The flexibility introduced through a rich choice of shape hyperparameters coupled with conjugacy are two desirable assets of this prior which are especially welcome for estimation and prediction. In this paper we look at the DAG-Wishart prior from the perspective of model selection, with special reference to its consistency properties in high dimensional settings. We show that Bayes factor consistency only holds when comparing two DAGs which do not belong to the same Markov equivalence class, equivalently they encode distinct conditional independencies; a similar result holds for posterior ratio consistency. We also prove that DAG-Wishart distributions with arbitrarily chosen hyperparameters will lead to incompatible priors for model selection, because they assign different marginal likelihoods to Markov equivalent graphs. To overcome this difficulty, we propose a constructive method to specify DAG-Wishart priors whose suitably constrained shape hyperparameters ensure compatibility for DAG model selection.

MSC:

62F15 Bayesian inference
62F07 Statistical ranking and selection procedures
62H22 Probabilistic graphical models

Software:

pcalg
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs., The Annals of Statistics 25 505-541. · Zbl 0876.60095
[2] Ben-David, E., Li, T., Massam, H. and Rajaratnam, B. (2015). High dimensional Bayesian inference for Gaussian directed acyclic graph models., arXiv preprint arXiv:1109.4371.
[3] Berger, J. (2006). The case for objective Bayesian analysis., Bayesian Anal. 1 385-402. · Zbl 1331.62042
[4] Cao, X., Khare, K., Ghosh, M. et al. (2019). Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models., The Annals of Statistics 47 319-348. · Zbl 1417.62140
[5] Castelletti, F., Consonni, G., Della Vedova, M. and Peluso, S. (2018). Learning Markov Equivalence Classes of Directed Acyclic Graphs: an Objective Bayes Approach., Bayesian Analysis 13 1231-1256. · Zbl 1407.62189
[6] Chickering, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In, Proceedings of the Eleventh conference on Uncertainty in artificial intelligence 87-98. Morgan Kaufmann Publishers Inc.
[7] Consonni, G. and La Rocca, L. (2012). Objective Bayes Factors for Gaussian Directed Acyclic Graphical Models., Scandinavian Journal of Statistics 39 743-756. · Zbl 1253.62015
[8] Consonni, G., La Rocca, L. and Peluso, S. (2017). Objective Bayes Covariate-Adjusted Sparse Graphical Model Selection., Scandinavian Journal of Statistics 3 741-764. · Zbl 06774144
[9] Consonni, G. and Veronese, P. (2008). Compatibility of Prior Specifications Across Linear Models., Statistical Science 23 332-353. · Zbl 1329.62331
[10] Consonni, G., Fouskakis, D., Liseo, B., Ntzoufras, I. et al. (2018). Prior distributions for objective Bayesian analysis., Bayesian Analysis 13 627-679. · Zbl 1407.62073
[11] Dawid, A. P. (2003). Causal inference using influence diagrams: the problem of partial compliance. In, Highly structured stochastic systems (P. J. Green, N. L. Hjort and S. Richardson, eds.) 45-81. Oxford Univ. Press, Oxford.
[12] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models., The Annals of Statistics 21 1272-1317. · Zbl 0815.62038
[13] Friedman, N. (2004). Inferring Cellular Networks Using Probabilistic Graphical Models., Science 303 799-805.
[14] Geiger, D. and Heckerman, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions., The Annals of Statistics 30 1412-1440. · Zbl 1016.62064
[15] Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs., Journal of Machine Learning Research 13 2409-2464. · Zbl 1433.68346
[16] Hauser, A. and Bühlmann, P. (2015). Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs., Journal of the Royal Statistical Society. Series B (Methodology) 77 291-318. · Zbl 1414.62021
[17] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H. and Bühlmann, P. (2012). Causal Inference Using Graphical Models with the R Package pcalg., Journal of Statistical Software 47 1-26.
[18] Lauritzen, S. L. (1996)., Graphical Models. Oxford University Press. · Zbl 0907.62001
[19] Lauritzen, S. L. (2001). Causal inference from graphical models. In, Complex stochastic systems (Eindhoven, 1999). Monogr. Statist. Appl. Probab. 87 63-107. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1010.62004
[20] Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs., Ann. Statist. 35 1278-1323. · Zbl 1194.62078
[21] Nagarajan, R. and Scutari, M. (2013)., Bayesian Networks in R with Applications in Systems Biology. Springer, New York. ISBN 978-1-4614-6445-7, 978-1-4614-6446-4. · Zbl 1272.62005
[22] O’Hagan, A. (1995). Fractional Bayes Factors for Model Comparison., Journal of the Royal Statistical Society. Series B (Methodological) 57 99-138. · Zbl 0813.62026
[23] Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances., Biometrika 101 219-228. · Zbl 1285.62005
[24] Peters, J., Mooij, J., Janzing, D. and Schölkopf, B. (2011). Identifiability of causal graphs using functional models. 589-598. AUAI Press, Corvallis, OR, USA.
[25] Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: orthogonality of variance-correlation parameters., Biometrika 94 1006-1013. · Zbl 1156.62043
[26] Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. and Nolan, G. (2003). Causal protein-signaling networks derived from multiparameter single-cell data., Science 308 504-506.
[27] Shimizu, S., Hoyer, P. O., Hyvärinen, A. and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery., Journal of Machine Learning Research 7 2003-2030. · Zbl 1222.68304
[28] Shojaie, A. and Michailidis, G. (2009). Analysis of gene sets based on the underlying regulatory network., Journal of Computational Biology 16 407-426.
[29] Verma, T.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.