×

Robust causal structure learning with some hidden variables. (English) Zbl 1420.62361

Summary: We introduce a new method to estimate the Markov equivalence class of a directed acyclic graph (DAG) in the presence of hidden variables, in settings where the underlying DAG among the observed variables is sparse, and there are a few hidden variables that have a direct effect on many of the observed variables. Building on the so-called low rank plus sparse framework, we suggest a two-stage approach which first removes the effect of the hidden variables and then estimates the Markov equivalence class of the underlying DAG under the assumption that there are no remaining hidden variables. This approach is consistent in certain high dimensional regimes and performs favourably when compared with the state of the art, in terms of both graphical structure recovery and total causal effect estimation.

MSC:

62M05 Markov processes: estimation; hidden Markov models
62G35 Nonparametric robustness
05C90 Applications of graph theory

Software:

pcalg; TETRAD
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aguet, F., A. A. Brown, S. Castel, J. R. Davis, P. Mohammadi, A. V. Segre, Z. Zappala, N. S. Abell, L. Fresard, E. R. Gamazon, E. Gelfand, M. J. Gloudemans, Y. He, F. Hormozdiari, X. Li, X. Li, B. Liu, D. Garrido‐Martin, H. Ongen, J. J. Palowitch, Y. S. Park, C. B. Peterson, G. Quon, S. Ripke, A. A. Shabalin, T. C. Shimko, B. J. Strober, T. J. Sullivan, N. A. Teran, E. K. Tsang, H. Zhang, Y.‐H. Zhou, A. Battle, C. D. Bustamante, N. J. Cox, B. E. Engelhardt, E. Eskin, G. Getz, M. Kellis, G. Li, D. G. MacArthur, A. B. Nobel, C. Sabatti, X. Wen, F. A. Wright, GTEx Consortium, T. Lappalainen, K. G. Ardlie, E. T. Dermitzakis, C. D. Brown and S. B. Montgomery (2016) Local genetic effects on gene expression across 44 human tissues. Technical Report. (Available from http://dx.doi.org/10.1101/074450.)
[2] Ali, R. A., Richardson, T. S. and Spirtes, P. (2009) Markov equivalence for ancestral graphs. Ann. Statist., 37, 2808- 2837. · Zbl 1178.68574
[3] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997) A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist., 25, 505- 541. · Zbl 0876.60095
[4] Bollen, K. (1989) Structural Equations with Latent Variables. New York: Wiley. · Zbl 0731.62159
[5] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011) Robust principal component analysis?J. Ass. Comput. Mach., 58, article 11. · Zbl 1327.62369
[6] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012) Latent variable graphical model selection via convex optimization. Ann. Statist., 40, 1935- 1967. · Zbl 1257.62061
[7] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011) Rank‐sparsity incoherence for matrix decomposition. SIAM J. Optimizn, 21, 572- 596. · Zbl 1226.90067
[8] Chickering, D. M. (2002) Learning equivalence classes of Bayesian‐network structures. J. Mach. Learn. Res., 2, 445- 498. · Zbl 1007.68179
[9] Claassen, T., Mooij, J. M. and Heskes, T. (2013) Learning sparse causal models is not NP‐hard. In Proc. 29th Conf. Uncertainty in Artificial Intelligence (eds A. Nicholson and P. Smyth), pp. 172- 181. Arlington: Association for Uncertainty in Artifical Intelligence Press.
[10] Colombo, D. and Maathuis, M. (2014) Order‐independent constraint‐based causal structure learning. J. Mach. Learn. Res., 15, 3741- 3782. · Zbl 1312.68165
[11] Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson, T. S. (2012) Learning high‐dimensional directed acyclic graphs with latent and selection variables. Ann. Statist., 40, 294- 321. · Zbl 1246.62131
[12] Darnell, J. E. (2002) Transcription factors as targets for cancer therapy. Nat. Rev. Cancer, 2, 740- 749.
[13] Drton, M. and Maathuis, M. H. (2017) Structure learning in graphical modeling. A. Rev. Statist. Appl., 4, 365- 393.
[14] Eeckhoute, J., Métivier, R. and Salbert, G. (2009) Defining specificity of transcription factor regulatory activities. J. Cell Sci., 122, 4027- 4034.
[15] Fang, K., Kotz, S. and Ng, K. (1990) Symmetric Multivariate and Related Distributions. New York: Chapman and Hall. · Zbl 0699.62048
[16] Foygel, R. and Drton, M. (2010) Extended Bayesian information criteria for Gaussian graphical models. In Proc. 23rd Int. Conf. Neural Information Processing Systems (eds J. D. Lafferty, C. K. I. Williams, J. Shawe‐Taylor, R. S. Zemel and A. Culotta), pp. 604- 612. Red Hook: Curran Associates.
[17] Frot, B., Jostins, L. and McVean, G. (2018) Graphical model selection for Gaussian conditional random fields in the presence of latent variables. J. Am. Statist. Ass., to be published. · Zbl 1420.62244
[18] Gagnon‐Bartsch, J. A., Jacob, L. and Speed, T. P. (2013) Removing unwanted variation from high dimensional data with negative controls. Technical Report 820. Department of Statistics, University of California at Berkeley, Berkeley.
[19] Han, S.W., Chen, G., Cheon, M.‐S. and Zhong, H. (2016) Estimation of directed acyclic graphs through two‐stage adaptive lasso for gene network inference. J. Am. Statist. Ass., 111, 1004- 1019.
[20] Han, F. and Liu, H. (2017) Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution. Bernoulli, 23, 23- 57. · Zbl 1359.62186
[21] Han, H., Shim, H., Shin, D., Shim, J. E., Ko, Y., Shin, J., Kim, H., Cho, A., Kim, E., Lee, T., Kim, H., Kim, K., Yang, S., Bae, D., Yun, A., Kim, S., Kim, C. Y., Cho, H. J., Kang, B., Shin, S. and Lee, I. (2015) TRRUST: a reference database of human transcriptional regulatory interactions. Scient. Rep., 5, article 11432.
[22] Harris, N. and Drton, M. (2013) PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res., 14, 3365- 3383. · Zbl 1318.62197
[23] Hastie, T., Tibshirani, R. and Wainwright, M. (2015) Statistical Learning with Sparsity: the Lasso and Generalizations. Boca Raton: Chapman and Hall-CRC. · Zbl 1319.68003
[24] Heinze‐Deml, C., Maathuis, M. H. and Meinshausen, N. (2018) Causal structure learning. A. Rev. Statist. Appl., 5, 371- 391.
[25] Kalisch, M. and Bühlmann, P. (2007) Estimating high‐dimensional directed acyclic graphs with the PC‐algorithm. J. Mach. Learn. Res., 8, 613- 636. · Zbl 1222.68229
[26] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. and Bühlmann, P. (2012) Causal inference using graphical models with the R package pcalg. J. Statist. Softwr., 47, 1- 26.
[27] Kang, H. M., Ye, C. and Eskin, E. (2008) Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics, 180, 1909- 1925.
[28] Lauritzen, S. L. (1996) Graphical Models. New York: Clarendon. · Zbl 0907.62001
[29] Leek, J. T. and Storey, J. D. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLOS Genet., 3, article e161.
[30] Liu, H., Han, F. and Zhang, C.‐H. (2012) Transelliptical graphical models. In Proc. 25th Int. Conf. Neural Information Processing Systems (eds F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger), pp. 800- 808. Red Hook: Curran Associates.
[31] Ma, S., Xue, L. and Zou, H. (2013) Alternating direction methods for latent variable Gaussian graphical model selection. Neurl Comput., 25, 2172- 2198. · Zbl 1418.62234
[32] Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010) Predicting causal effects in large‐scale systems from observational data. Nat. Meth., 7, 247- 248.
[33] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009) Estimating high‐dimensional intervention effects from observational data. Ann. Statist., 37, 3133- 3164. · Zbl 1191.62118
[34] Malinsky, D. and Spirtes, P. (2017) Estimating bounds on causal effects in high‐dimensional and possibly confounded systems. Int. J. Approx. Reason., 88, 371- 384. · Zbl 1429.62187
[35] Mostafavi, S., Battle, A., Zhu, X., Urban, A. E., Levinson, D., Montgomery, S. B. and Koller, D. (2013) Normalizing RNA‐sequencing data by modeling hidden covariates with prior knowledge. PLOS One, 8, article e68141.
[36] Nandy, P., Hauser, A. and Maathuis, M. H. (2018) High‐dimensional consistency in score‐based and hybrid structure learning. Ann. Statist., 46, 3151- 3183. · Zbl 1411.62144
[37] Nandy, P., Maathuis, M. H. and Richardson, T. S. (2017) Estimating the effect of joint interventions from observational data in sparse high‐dimensional settings. Ann. Statist., 45, 647- 674. · Zbl 1426.62286
[38] Pearl, J. (2009a) Causal inference in statistics: an overview. Statist. Surv., 3, 96- 146. · Zbl 1300.62013
[39] Pearl, J. (2009b) Causality: Models, Reasoning and Inference, 2nd edn. Cambridge: Cambridge University Press. · Zbl 1188.68291
[40] Qi, H. and Sun, D. (2006) A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM J. Matrx Anal. Appl., 28, 360- 385. · Zbl 1120.65049
[41] Redell, M. and Tweardy, D. (2005) Targeting transcription factors for cancer therapy. Curr. Pharmceut. Desgn, 11, 2873- 2887.
[42] Richardson, T. S. and Spirtes, P. (2002) Ancestral graph Markov models. Ann. Statist., 30, 962- 1030. · Zbl 1033.60008
[43] Robinson, R. W. (1977) Counting unlabeled acyclic digraphs. In Combinatorial Mathematics (ed. C. H. C. Little), pp. 28- 43. Berlin: Springer. · Zbl 0376.05031
[44] Silva, R., Scheines, R., Glymour, C. and Spirtes, P. (2006) Learning the structure of linear latent variable models. J. Mach. Learn. Res., 7, 191- 246. · Zbl 1222.68307
[45] Spirtes, P., Glymour, C. and Scheines, R. (2000) Causation, Prediction, and Search, 2nd edn. Cambridge: MIT Press. · Zbl 0806.62001
[46] Spirtes, P., Meek, C. and Richardson, T. (1995) Causal inference in the presence of latent variables and selection bias. In Proc. 11th Conf. Uncertainty in Artificial Intelligence (eds P. Besnard and S. Hanks), pp. 499- 506. San Francisco: Morgan Kaufmann.
[47] Spirtes, P., Richardson, T., Meek, C., Scheines, R. and Glymour, C. (1998) Using path diagrams as a structural equation modeling tool. Sociol. Meth. Res., 27, 182- 225.
[48] Stegle, O., Parts, L., Piipari, M., Winn, J. and Durbin, R. (2012) Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protcls, 7, 500- 507.
[49] Taeb, A., Reager, J. T., Turmon, M. and Chandrasekaran, V. (2017) A statistical graphical model of the California reservoir system. Wat. Resour. Res., 53, 9721- 9739.
[50] Tsamardinos, I., Brown, L. E. and Aliferis, C. F. (2006) The max‐min hill‐climbing Bayesian network structure learning algorithm. Mach. Learn., 65, 31- 78. · Zbl 1470.68192
[51] Verma, T. and Pearl, J. (1991) Equivalence and synthesis of causal models. In Proc. 6th A. Conf. Uncertainty in Artificial Intelligence (eds P. P. Bonissone, M. Henrion, L. N. Kanal and J. F. Lemmer), pp. 255- 270. New York: Elsevier.
[52] Vershynin, R. (2012) Introduction to the non‐asymptotic analysis of random matrices. In Compressed Sensing: Theory and Applications (eds Y. C. Eldar and G. Kutyniok), pp. 210- 268. Cambridge: Cambridge University Press.
[53] Wegkamp, M. and Zhao, Y. (2016) Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli, 22, 1184- 1226. · Zbl 1388.62162
[54] Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelić, A., von Rohr, P., Thiele, L., Zitzler, E., Gruissem, W. and Bühlmann, P. (2004) Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol., 5, article R92.
[55] Zhang, J. (2008) On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell., 172, 1873- 1896. · Zbl 1184.68434
[56] Zou, H. (2006) The adaptive lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418- 1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.