×

Learning causal networks via additive faithfulness. (English) Zbl 1475.68279

The paper introduces a novel statistical model for causal network analysis, called additively faithful directed acyclic graph (AFDAG), together with an estimating algorithm of partial-correlation type. That algorithm is based on additive conditional independence (ACI), which offers the flexibility of a nonparametric approach, and can capture nonlinear interactions that elude a Gaussian graphical model or a copula-Gaussian graphical model. Two other key ingredients of the AFDAG are a linear operator, called normalized additive covariance operator, and a modification of the ACI definition so that it is characterized via an additive reproducing kernel Hilbert space instead of the \(L_2\) space. Using those concepts, a substantial body of theoretical results is presented in the paper, concerning convergence rate, cosistency, uniform consistency and strong additive faithfulness. Probably most important among them is that if the norm of the empirical version of the normalized additive covariance operator is small for some set of components \(X^S\) and some components \(X^i\), \(X^j\), \(i,j\not\in S\), then \(X^i\) and \(X^j\) are with high probability conditionally independent given \(X^S\), and that the complexity of the algorithm depends not on the dimension of the network, but instead on the AFDAG level of sparseness. From a practical point of view, the computational simplicity of the AFDAG is noteworthy, due to which it can deal with high-dimensional networks.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
62D20 Causal inference from observational studies
62G05 Nonparametric estimation
62H22 Probabilistic graphical models
PDFBibTeX XMLCite
Full Text: Link

References:

[1] N. Aronszajn. Theory of reproducing kernels.Transactions of the American Mathematical Society, 68(3):337-404, 1950. · Zbl 0037.20701
[2] F. R. Bach. Consistency of the group lasso and multiple kernel learning.The Journal of Machine Learning Research, 9:1179-1225, 2008. · Zbl 1225.68147
[3] C. R. Baker. Joint measures and cross-covariance operators.Transactions of the American Mathematical Society, 186:273-289, 1973. · Zbl 0304.28008
[4] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner.Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press. Baltimore and London, 1993. · Zbl 0786.62001
[5] D. M. Chickering. Optimal structure identification with greedy search.The Journal of Machine Learning Research, 3(3):507-554, 2002. · Zbl 1084.68519
[6] B. Ellis and W. Wong. Learning causal Bayesian network structures from experimental data.Journal of the American Statistical Association, (103):778-789, 2008. · Zbl 1471.62056
[7] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American Statistical Association, 96:1348-1360, 2001. · Zbl 1073.62547
[8] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9:432-41, 2008. · Zbl 1143.62076
[9] K. Fukumizu, F. R. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces.Journal of Machine Learning Research, 5:73-99, 2004. · Zbl 1222.62069
[10] K. Fukumizu, F. R. Bach, and A. Gretton.Statistical consistency of kernel canonical correlation analysis.The Journal of Machine Learning Research, 8:361-383, 2007. · Zbl 1222.62063
[11] K. Fukumizu, a. Gretton, X. Sun, and B. Sch¨olkopf. Kernel Measures of Conditional Dependence.Advances in Neural Information Processing Systems, 20:489-496, 2008.
[12] K. Fukumizu, F. R. Bach, and M. Jordan. Kernel dimension reduction in regression.Annals of Statistics, 37(5):1871-1905, 2009. · Zbl 1168.62049
[13] A. Gretton, K. Borgwardt, M. J. Rasch, B. Sch¨olkopf, and A. Smola. A Kernel Two-Sample Test.Journal of Machine Learning Research, 13:723-773, 2012. · Zbl 1283.62095
[14] N. Harris and M. Drton. Pc algorithm for nonparanormal graphical models.The Journal of Machine Learning Research, 14(1):3365-3383, 2013. · Zbl 1318.62197
[15] Y.-B. He and Z. Geng. Active Learning of Causal Networks with Intervention Experiments and Optimal Designs .Journal of Machine Learning Research, (9):2523-2547, 2008. · Zbl 1225.68184
[16] R. A. Horn and C. R. Johnson.Matrix Analysis. Cambridge University Press, 1985. · Zbl 0576.15001
[17] P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Sch¨olkopf. Nonlinear causal discovery with additive noise models.Advances in Neural Information Processing Systems 21 (NIPS), 2009.
[18] T. Huang. Testing conditional independence using maximal nonlinear conditional correlation.Ann. Statist., 38:2047-2091, 2010. · Zbl 1202.62078
[19] I. T. Jolliffe.Principal component analysis. Springer, 2nd edition, 2002. · Zbl 1011.62064
[20] M. Kalisch and P. B¨uhlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.The Journal of Machine Learning Research, 8:613-636, 2007. · Zbl 1222.68229
[21] S. L. Lauritzen.Graphical models. Oxford University Press, 1996. · Zbl 0907.62001
[22] K.-Y. Lee, B. Li, and F. Chiaromonte. A general theory for nonlinear sufficient dimension reduction: Formulation and estimation.The Annals of Statistics, 41(1):221-249, 2013. · Zbl 1347.62018
[23] K.-Y. Lee, B. Li, and H. Zhao. Variable selection via additive conditional independence. Journal of the Royal Statistical Society. Series B., 78(5):1037-1055, 2016. · Zbl 1414.62309
[24] B. Li, H. Chun, and H. Zhao. Sparse estimation of conditional graphical models with application to gene networks.Journal of the American Statistical Association, 107:152- 167, 2012. · Zbl 1261.62049
[25] B. Li, H. Chun, and H. Zhao. On an additive semi-graphoid model for statistical networks with application to pathway analysis.Journal of the American Statistical Association, 109:1188-1204, 2014. · Zbl 1368.62180
[26] O. Linton and P. Gozalo. Conditional Independence Restrictions: Testing and Estimation. Cowles Foundation Discussion Papers 1140, Cowles Foundation for Research in Economics, Yale University, November 1996. URLhttps://ideas.repec.org/p/cwl/ cwldpp/1140.html.
[27] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: semiparametric estimation of high dimensional undirected graphs.The Journal of Machine Learning Research, 10: 2295-2328, 2009. · Zbl 1235.62035
[28] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High-dimensional semiparametric Gaussian copula graphical models.The Annals of Statistics, 40(4):2293-2326, 2012. · Zbl 1297.62073
[29] S. P. Lloyd. Least Squares Quantization in PCM.IEEE Transactions on Information Theory, 28:129-137, 1982. · Zbl 0504.94015
[30] R. Luo and H. Zhao. Bayesian Hierarchical Modeling for Signaling Pathway Inference From Single Cell Interventional Data.The Annals of Applied Statistics, 5(2A):725-745, 2011. · Zbl 1223.62014
[31] D. Margaritis. Distribution-free learning of Bayesian network structure in continuous domains.In Proc. AAAI, pages 825-830, 2005.
[32] J. Mooij, D. Janzing, J. Peters, and B. Sch¨olkopf. Regression by dependence minimization and its application to causal inference in additive noise models.Proceedings of the 26th Annual International Conference on Machine Learning, pages 745-752, 2009.
[33] K. Muandet, K. Fukumizu, B. Sriperumbudur, A. Gretton, and B. Sch¨olkopf. Kernel mean estimation and Stein effect.ICML, pages 10-18, 2014.
[34] K. Muandet, K. Fukumizu, B. Sriperumbudur, A. Gretton, and B. Sch¨olkopf. Kernel Mean Shrinkage Estimators.Journal of Machine Learning Research, 17:1-41, 2016. · Zbl 1360.62134
[35] J. Pearl.Causality: models, reasoning and inference. Cambridge Univ Press, 2nd ed., 2009. · Zbl 1188.68291
[36] J. Pearl and T. Verma. The logic of representing dependencies by directed graphs.Proceedings of the Sixth National Conference on Artificial Intelligence, 1:374-379, 1987.
[37] J. Pearl, D. Geiger, and T. Verma.Conditional independence and its representations. Kybernetika, 25:33-44, 1989.
[38] J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨olkopf. Causal Discovery with Continuous Additive Noise Models.Journal of Machine Learning research, 15:2009-2053, 2014.
[39] K. Sachs, O. Perez, D. Pe’er, D. a Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523-9, 2005.
[40] Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. Equivalence of distance-based and RKHS-based statistics in hypothesis testing.The Annals of Statistics, 41(5):2263-2291, 2013. · Zbl 1281.62117
[41] S. Shimizu, P. Hoyer, A. nad Hyv¨arinen, and A. J. Kerminen. A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, (7):2003-2030, 2006.
[42] K. Song.Testing conditional independence via Rosenblatt transforms.The Annals of Statistics, 37(6B):4011-4045, December 2009. · Zbl 1191.62090
[43] P. Spirtes, C. Glymour, and R. Scheines.Causation, Prediction, and Search. The MIT Press, 2000. · Zbl 0806.62001
[44] L. Su and H. White. A consistent characteristic function-based test for conditional independence.Journal of Econometrics, 141(2):807-834, December 2007. · Zbl 1418.62201
[45] L. Su and H. White. A nonparametric hellinger metric test for conditional independence. Econometric Theory, 24(04):233-36, April 2008. · Zbl 1284.62285
[46] X. Sun, D. Janzing, B. Sch¨olkopf, and K. Fukumizu.A kernel-based causal learning algorithm. ICML, New York, New York, USA, June 2007.
[47] R. Tibshirani.Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society, Series B, 58:267-288, 1996. · Zbl 0850.62538
[48] R. Tillman, A. Gretton, and P. Spirtes. Nonlinear directed acyclic structure learning with weakly additive noise models.Proceedings of Advances in Neural Processing Information Systems, 22:1847-1855, 2009.
[49] I. Tsamardinos, L. E. Brown, and C. F. Aliferis.The max-min hill-climbing Bayesian network structure learning algorithm.Machine Learning, 65(1):31-78, 2006.
[50] C. Uhler, G. Raskutti, P. B¨uhlmann, and B. Yu. Geometry of the faithfulness assumption in causal inference.The Annals of Statistics, 41(2):436-463, April 2013. · Zbl 1267.62068
[51] S. van de Geer and P. B¨uhlmann.‘0-penalized maximum likelihood for sparse directed acyclic graphs.Ann. Statist., 41(2):536-567, 2013. · Zbl 1267.62037
[52] T. Verma and J. Pearl. Equivalence and synthesis of causal models.Proc. Sixth Conference on Uncertainty in AI. Association for Uncertainty in AI, Inc., pages 220-227, 1990.
[53] T. Verma and J. Pearl. Equivalence and synthesis of causal models. InProceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI ’90, pages 255-270, New York, NY, USA, 1991. Elsevier Science Inc. ISBN 0-444-89264-8. URL http://dl.acm.org/citation.cfm?id=647233.719736.
[54] J. Weidmann.Linear Operators in Hilbert Spaces. Springer, New York, 1980. · Zbl 0434.47001
[55] L. Xue and H. Zou. Regularized rank-based estimation of high-dimensional nonparanormal graphical models.Ann. Statist., 40(5):2541-2571, 2012. · Zbl 1373.62138
[56] J. Zhang and P. Spirtes. Strong faithfulness and uniform consistency in causal inference. In In Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 2002.
[57] K. Zhang and A. Hyv¨arinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
[58] K. Zhang, J. Peters, D. Janzing, and B. Sch¨olkopf. Kernel-based conditional independence test and application in causal discovery. InProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, pages 804-813, Arlington, Virginia, United States, 2011. AUAI Press. ISBN 978-0-9749039-7-2. URLhttp://dl.acm.org/ citation.cfm?id=3020548.3020641.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.