×

Identifiability of additive noise models using conditional variances. (English) Zbl 1498.68256

Summary: This paper considers a new identifiability condition for additive noise models (ANMs) in which each variable is determined by an arbitrary Borel measurable function of its parents plus an independent error. It has been shown that ANMs are fully recoverable under some identifiability conditions, such as when all error variances are equal. However, this identifiable condition could be restrictive, and hence, this paper focuses on a relaxed identifiability condition that involves not only error variances, but also the influence of parents. This new class of identifiable ANMs does not put any constraints on the form of dependencies, or distributions of errors, and allows different error variances. It further provides a statistically consistent and computationally feasible structure learning algorithm for the identifiable ANMs based on the new identifiability condition. The proposed algorithm assumes that all relevant variables are observed, while it does not assume faithfulness or a sparse graph. Demonstrated through extensive simulated and real multivariate data is that the proposed algorithm successfully recovers directed acyclic graphs.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62D20 Causal inference from observational studies
62H12 Estimation in multivariate analysis
62H22 Probabilistic graphical models
PDFBibTeX XMLCite
Full Text: Link

References:

[1] Wenyu Chen, Mathias Drton, and Y Samuel Wang. On causal discovery with an equalvariance assumption.Biometrika, 106(4):973-980, 2019.
[2] David Maxwell Chickering. Optimal structure identification with greedy search.The Journal of Machine Learning Research, 3:507-554, 2003.
[3] Frederick Eberhardt. Introduction to the foundations of causal discovery.International Journal of Data Science and Analytics, 3(2):81-91, 2017.
[4] David Edwards.Introduction to graphical modelling. Springer Science & Business Media, 2012.
[5] Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe’er. Using bayesian networks to analyze expression data.Journal of computational biology, 7(3-4):601-620, 2000.
[6] Asish Ghoshal and Jean Honorio. Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. InAdvances in Neural Information Processing Systems, pages 6457-6466, 2017.
[7] Asish Ghoshal and Jean Honorio. Learning linear structural equation models in polynomial time and sample complexity. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 ofProceedings of Machine Learning Research, pages 1466-1475, Playa Blanca, Lanzarote, Canary Islands, 09-11 Apr 2018. PMLR.
[8] Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10, 2019.
[9] Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨olkopf. Nonlinear causal discovery with additive noise models. InAdvances in neural information processing systems, pages 689-696, 2009.
[10] Markus Kalisch and Peter B¨uhlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8(Mar):613-636, 2007.
[11] Po-Ling Loh and Peter B¨uhlmann. High-dimensional learning of linear causal networks via inverse covariance estimation.The Journal of Machine Learning Research, 15(1): 3065-3105, 2014.
[12] Joris Mooij, Dominik Janzing, Jonas Peters, and Bernhard Sch¨olkopf. Regression by dependence minimization and its application to causal inference in additive noise models. InProceedings of the 26th annual international conference on machine learning, pages 745-752. ACM, 2009.
[13] Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch¨olkopf. Distinguishing cause from effect using observational data: methods and benchmarks.The Journal of Machine Learning Research, 17(1):1103-1204, 2016.
[14] Christopher Nowzohour and Peter B¨uhlmann. Score-based causal learning in additive noise models.Statistics, 50(3):471-485, 2016.
[15] Gunwoong Park and Youngwhan Kim. Identifiability of gaussian linear structural equation models with homogeneous and heterogeneous error variances.Journal of the Korean Statistical Society, 49(1):276-292, 2020.
[16] Gunwoong Park and Hyewon Park. Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. InProceedings of Machine Learning Research, volume 89 ofProceedings of Machine Learning Research, pages 158-166. PMLR, 16-18 Apr 2019a.
[17] Gunwoong Park and Sion Park. High-dimensional poisson structural equation model learning via‘1-regularized regression.Journal of Machine Learning Research, 20(95):1-41, 2019b.
[18] Gunwoong Park and Garvesh Raskutti. Learning large-scale poisson dag models based on overdispersion scoring. InAdvances in Neural Information Processing Systems, pages 631-639, 2015.
[19] Gunwoong Park and Garvesh Raskutti. Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods).Journal of Machine Learning Research, 18(224): 1-44, 2018.
[20] Judea Pearl.Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier, 2014.
[21] Jonas Peters and Peter B¨uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219-228, 2014.
[22] Jonas Peters, Dominik Janzing, and Bernhard Sch¨olkopf. Identifying cause and effect on discrete data using additive noise models. InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 597-604, 2010.
[23] Jonas Peters, Joris Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Identifiability of causal graphs using functional models.arXiv preprint arXiv:1202.3757, 2012.
[24] Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Causal discovery with continuous additive noise models.The Journal of Machine Learning Research, 15 (1):2009-2053, 2014.
[25] Garvesh Raskutti and Caroline Uhler. Learning directed acyclic graph models based on sparsest permutations.Stat, 7(1):e183, 2018.
[26] Pradeep Ravikumar, Martin J Wainwright, Garvesh Raskutti, Bin Yu, et al.Highdimensional covariance estimation by minimizing‘1-penalized log-determinant divergence.Electronic Journal of Statistics, 5:935-980, 2011.
[27] Karen Sachs, Omar Perez, Dana Pe’er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523-529, 2005.
[28] Marco Scutari. Learning bayesian networks with the bnlearn r package.arXiv preprint arXiv:0908.3817, 2009.
[29] Jun Shao. Linear model selection by cross-validation.Journal of the American statistical Association, 88(422):486-494, 1993.
[30] Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨arinen, and Antti Kerminen. A linear nonGaussian acyclic model for causal discovery.The Journal of Machine Learning Research, 7:2003-2030, 2006.
[31] Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv¨arinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12(Apr):1225-1248, 2011.
[32] Peter Spirtes. Directed cyclic graphical representations of feedback models. InProceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 491-498. Morgan Kaufmann Publishers Inc., 1995.
[33] Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000.
[34] Ioannis Tsamardinos and Constantin F Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. InProceedings of the ninth international workshop on Artificial Intelligence and Statistics. Morgan Kaufmann Publishers: Key West, FL, USA, 2003.
[35] Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31-78, 2006.
[36] Caroline Uhler, Garvesh Raskutti, Peter B¨uhlmann, and Bin Yu. Geometry of the faithfulness assumption in causal inference.The Annals of Statistics, pages 436-463, 2013.
[37] Y Samuel Wang and Mathias Drton.High-dimensional causal discovery under nongaussianity.Biometrika, 107(1):41-59, 2020.
[38] Jiji Zhang and Peter Spirtes. The three faces of faithfulness.Synthese, 193(4):1011-1027, 2016.
[39] Kun Zhang and Aapo Hyv¨arinen.Causality discovery with additive disturbances: An information-theoretical perspective. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 570-585. Springer, 2009a.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.