×

Estimating high-dimensional intervention effects from observational data. (English) Zbl 1191.62118

Summary: We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. We combine these two parts. For each DAG in the estimated equivalence class, we use an intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study and on a data set about riboflavin production.

MSC:

62H99 Multivariate analysis
05C90 Applications of graph theory
65C60 Computational problems in statistics (MSC2010)
05C20 Directed graphs (digraphs), tournaments

Software:

pcalg; ggm; TETRAD
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Beeri, C., Fagin, R., Maier, D. and Yannakakis, M. (1983). On the desirability of acyclic database schemes. J. Assoc. Comput. Mach. 30 479-513. · Zbl 0624.68087 · doi:10.1145/2402.322389
[2] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445-498. · Zbl 1007.68179 · doi:10.1162/153244302760200696
[3] Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507-554. · Zbl 1084.68519 · doi:10.1162/153244303321897717
[4] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462-467. · Zbl 0165.22305 · doi:10.1109/TIT.1968.1054142
[5] Dawid, A. P. (2000). Causal inference without counterfactuals. J. Amer. Statist. Assoc. 95 407-448. JSTOR: · Zbl 0999.62003 · doi:10.2307/2669377
[6] Dirac, G. A. (1961). On rigid circuit graphs. Abh. Math. Sem. Univ. Hamburg 25 71-76. · Zbl 0098.14703 · doi:10.1007/BF02992776
[7] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96-104. · Zbl 1089.62502 · doi:10.1198/016214504000000089
[8] Freedman, D. A. (2005). On specifying graphical models for causation, and the identification problem. In Identification and Inference for Econometric Models 56-79. Cambridge Univ. Press, Cambridge. · Zbl 1120.62003 · doi:10.1017/CBO9780511614491.005
[9] Fulkerson, D. R. and Gross, O. A. (1965). Incidence matrices and interval graphs. Pacific J. Math. 15 835-855. · Zbl 0132.21001 · doi:10.2140/pjm.1965.15.835
[10] Greenland, S., Pearl, J. and Robins, J. (1999). Causal diagrams for epidemiologic research. Epidemiology 10 37-48. · Zbl 1059.62506
[11] Greenland, S., Robins, J. and Pearl, J. (1999). Confounding and collapsibility in causal inference. Statist. Sci. 14 29-46. · Zbl 1059.62506 · doi:10.1214/ss/1009211805
[12] Heckerman, D., Geiger, D. and Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. J. Mach. Learn. Res. 20 197-243. · Zbl 0831.68096
[13] Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945-970. JSTOR: · Zbl 0607.62001 · doi:10.2307/2289064
[14] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613-636. · Zbl 1222.68229
[15] Kalisch, M. and Mächler, M. (2008). R-package pcalg: Estimating the skeleton and equivalence class of a dag. Available at http://cran.r-project.org.
[16] Kaufman, J. and Kaufman, S. (2001). Assessment of structured socioeconomic effects on health. Epidemiology 12 157-167.
[17] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17 . Oxford Univ. Press, New York. · Zbl 0907.62001
[18] Lauritzen, S. L. (2001). Causal inference from graphical models. In Complex Stochastic Systems 63-107. Chapman and Hall/CRC, Boca Raton, FL. · Zbl 1010.62004
[19] Marchetti, G. M. and Drton, M. (2006). R-package ggm: Graphical Gaussian models. Available at http://cran.r-project.org.
[20] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence 403-418. Morgan Kaufmann, San Francisco, CA.
[21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[22] Meinshausen, N. and Bühlmann, P. (2008). Stability selection. Preprint. Available at · Zbl 1144.62326
[23] Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82 669-710. With discussion and a rejoinder by the author. JSTOR: · Zbl 0860.62045 · doi:10.1093/biomet/82.4.669
[24] Pearl, J. (2000). Causality: Models, Reasoning, and Inference . Cambridge Univ. Press, Cambridge. · Zbl 0959.68116
[25] Pearl, J. (2003). Statistics and causal inference: A review. Test 12 281-318. · Zbl 1044.62003 · doi:10.1007/BF02595718
[26] Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Ann. Statist. 30 962-1030. · Zbl 1033.60008 · doi:10.1214/aos/1031689015
[27] Richardson, T. S. and Spirtes, P. (2003). Causal inference via ancestral graph models. In Highly Structured Stochastic Systems. Oxford Statistical Science Series 27 83-113. Oxford Univ. Press, Oxford.
[28] Robins, J. M., Scheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika 90 491-515. · Zbl 1436.62025 · doi:10.1093/biomet/90.3.491
[29] Shimizu, S., Hoyer, P. O., Hyvärinen, A. and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7 2003-2030. · Zbl 1222.68304
[30] Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L. and Cowell, R. G. (1993). Bayesian analysis in expert systems. Statist. Sci. 8 219-283. · Zbl 0955.62523 · doi:10.1214/ss/1177010888
[31] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search , 2nd ed. MIT Press, Cambridge, MA. · Zbl 0806.62001
[32] Van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York. · Zbl 0862.60002
[33] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence 220-227. Morgan Kaufmann, San Francisco, CA.
[34] Zhang, J. (2008). Causal reasoning with ancestral graphs. J. Mach. Learn. Res. 9 1437-1474. · Zbl 1225.68254
[35] Zhang, J. and Spirtes, P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence 632-639. Morgan Kaufmann, San Francisco, CA.
[36] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.