×

Learning a high-dimensional linear structural equation model via \(\ell_1\)-regularized regression. (English) Zbl 07370619

Summary: This paper develops a new approach to learning high-dimensional linear structural equation models (SEMs) without the commonly assumed faithfulness, Gaussian error distribution, and equal error distribution conditions. A key component of the algorithm is component-wise ordering and parent estimations, where both problems can be efficiently addressed using \(\ell_1\)-regularized regression. This paper proves that sample sizes \(n = \Omega(d^2 \log p)\) and \(n = \Omega(d^2 p^{2/m})\) are sufficient for the proposed algorithm to recover linear SEMs with sub-Gaussian and \((4m)\)-th bounded-moment error distributions, respectively, where \(p\) is the number of nodes and \(d\) is the maximum degree of the moralized graph. Further shown is the worst-case computational complexity \(O(n (p^3 + p^2 d^2))\), and hence, the proposed algorithm is statistically consistent and computationally feasible for learning a high-dimensional linear SEM when its moralized graph is sparse. Through simulations, we verify that the proposed algorithm is statistically consistent and computationally feasible, and it performs well compared to the state-of-the-art US, GDS, LISTEN and TD algorithms with our settings. We also demonstrate through real COVID-19 data that the proposed algorithm is well-suited to estimating a virus-spread map in China.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

DirectLiNGAM; glmnet
PDFBibTeX XMLCite
Full Text: Link

References:

[1] Peter B¨uhlmann, Jonas Peters, Jan Ernest, et al. Cam: Causal additive models, highdimensional order search and penalized regression.The Annals of Statistics, 42(6):2526- 2556, 2014.
[2] Peter B¨uhlmann et al. Statistical significance in high-dimensional linear models.Bernoulli, 19(4):1212-1242, 2013.
[3] Wenyu Chen, Mathias Drton, and Y Samuel Wang. On causal discovery with an equalvariance assumption.Biometrika, 106(4):973-980, 2019.
[4] Victor Chernozhukov, Wolfgang K H¨ardle, Chen Huang, and Weining Wang. Lasso-driven inference in time and space.Available at SSRN 3188362, 2019.
[5] Frederick Eberhardt. Introduction to the foundations of causal discovery.International Journal of Data Science and Analytics, 3(2):81-91, 2017.
[6] Jianqing Fan, Shaojun Guo, and Ning Hao.Variance estimation using refitted crossvalidation in ultrahigh dimensional regression.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):37-65, 2012.
[7] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. glmnet: Lasso and elastic-net regularized generalized linear models.R package version, 1(4), 2009.
[8] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of statistical software, 33(1):1, 2010.
[9] Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe’er. Using bayesian networks to analyze expression data.Journal of computational biology, 7(3-4):601-620, 2000.
[10] Asish Ghoshal and Jean Honorio. Learning identifiable gaussian bayesian networks in polynomial time and sample complexity. InAdvances in Neural Information Processing Systems, pages 6457-6466, 2017.
[11] Asish Ghoshal and Jean Honorio. Learning linear structural equation models in polynomial time and sample complexity. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 ofProceedings of Machine Learning Research, pages 1466-1475, Playa Blanca, Lanzarote, Canary Islands, 09-11 Apr 2018. PMLR.
[12] Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10, 2019.
[13] Naftali Harris and Mathias Drton. Pc algorithm for nonparanormal graphical models.The Journal of Machine Learning Research, 14(1):3365-3383, 2013.
[14] Jerry A Hausman. Specification and estimation of simultaneous equation models.Handbook of econometrics, 1:391-448, 1983.
[15] Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨olkopf. Nonlinear causal discovery with additive noise models. InAdvances in neural information processing systems, pages 689-696, 2009.
[16] Guido W Imbens and Whitney K Newey. Identification and estimation of triangular simultaneous equations models without additivity.Econometrica, 77(5):1481-1512, 2009.
[17] Sven Klaassen, Jannis K¨uck, Martin Spindler, and Victor Chernozhukov. Uniform inference in high-dimensional gaussian graphical models.arXiv preprint arXiv:1808.10532, 2018.
[18] Steffen L Lauritzen.Graphical models. Oxford University Press, 1996.
[19] Po-Ling Loh and Peter B¨uhlmann. High-dimensional learning of linear causal networks via inverse covariance estimation.The Journal of Machine Learning Research, 15(1): 3065-3105, 2014.
[20] Nicolai Meinshausen and Peter B¨uhlmann. High-dimensional graphs and variable selection with the lasso.The Annals of Statistics, pages 1436-1462, 2006.
[21] Joris Mooij, Dominik Janzing, Jonas Peters, and Bernhard Sch¨olkopf. Regression by dependence minimization and its application to causal inference in additive noise models. InProceedings of the 26th annual international conference on machine learning, pages 745-752. ACM, 2009.
[22] Radhakrishnan Nagarajan, Marco Scutari, and Sophie L‘ebre.Bayesian networks in r. Springer, 122:125-127, 2013.
[23] Whitney K Newey, James L Powell, and Francis Vella. Nonparametric estimation of triangular simultaneous equations models.Econometrica, 67(3):565-603, 1999.
[24] Gunwoong Park. Identifiability of additive noise models using conditional variances.Journal of Machine Learning Research, 21(75):1-34, 2020.
[25] Gunwoong Park and Yesool Kim. Learning high-dimensional gaussian linear structural equation models with heterogeneous error variances.Computational Statistics & Data Analysis, 154:107084, 2021.
[26] Gunwoong Park and Youngwhan Kim. Identifiability of gaussian linear structural equation models with homogeneous and heterogeneous error variances.Journal of the Korean Statistical Society, 49(1):276-292, 2020.
[27] Gunwoong Park and Hyewon Park. Identifiability of generalized hypergeometric distribution (ghd) directed acyclic graphical models. InProceedings of Machine Learning Research, volume 89 ofProceedings of Machine Learning Research, pages 158-166. PMLR, 16-18 Apr 2019a.
[28] Gunwoong Park and Sion Park. High-dimensional poisson structural equation model learning via‘1-regularized regression.Journal of Machine Learning Research, 20(95):1-41, 2019b.
[29] Gunwoong Park and Garvesh Raskutti. Learning large-scale poisson dag models based on overdispersion scoring. InAdvances in Neural Information Processing Systems, pages 631-639, 2015.
[30] Gunwoong Park and Garvesh Raskutti. Learning quadratic variance function (qvf) dag models via overdispersion scoring (ods).Journal of Machine Learning Research, 18(224): 1-44, 2018.
[31] Jonas Peters and Peter B¨uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219-228, 2014.
[32] Jonas Peters, Joris Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Identifiability of causal graphs using functional models.arXiv preprint arXiv:1202.3757, 2012.
[33] Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Sch¨olkopf. Causal discovery with continuous additive noise models.The Journal of Machine Learning Research, 15 (1):2009-2053, 2014.
[34] Pradeep Ravikumar, Martin J Wainwright, John D Lafferty, et al. High-dimensional ising model selection using‘1-regularized logistic regression.The Annals of Statistics, 38(3): 1287-1319, 2010.
[35] Pradeep Ravikumar, Martin J Wainwright, Garvesh Raskutti, Bin Yu, et al.Highdimensional covariance estimation by minimizing‘1-penalized log-determinant divergence.Electronic Journal of Statistics, 5:935-980, 2011.
[36] Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨arinen, and Antti Kerminen. A linear nonGaussian acyclic model for causal discovery.The Journal of Machine Learning Research, 7:2003-2030, 2006.
[37] Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv¨arinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12(Apr):1225-1248, 2011.
[38] Ali Shojaie and George Michailidis. Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs.Biometrika, 97(3):519-538, 2010.
[39] Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000.
[40] Caroline Uhler, Garvesh Raskutti, Peter B¨uhlmann, and Bin Yu. Geometry of the faithfulness assumption in causal inference.The Annals of Statistics, pages 436-463, 2013.
[41] Martin J Wainwright, John D Lafferty, and Pradeep K Ravikumar. High-dimensional graphical model selection using‘1-regularized logistic regression. InAdvances in neural information processing systems, pages 1465-1472, 2006.
[42] Yi-Chi Wu, Ching-Sung Chen, and Yu-Jiun Chan. The outbreak of covid-19: An overview. Journal of the Chinese Medical Association, 83(3):217, 2020.
[43] Eunho Yang, Pradeep Ravikumar, Genevera I Allen, and Zhandong Liu. Graphical models via univariate exponential family distributions.Journal of Machine Learning Research, 16(1):3813-3847, 2015.
[44] Cun-Hui Zhang and Stephanie S Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models.Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 217-242, 2014.
[45] Kun Zhang and Aapo Hyv¨arinen.Causality discovery with additive disturbances: An information-theoretical perspective. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 570-585. Springer, 2009a.
[46] Kun Zhang and Aapo Hyv¨arinen. On the identifiability of the post-nonlinear causal model. InProceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pages 647-655. AUAI Press, 2009b.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.