×

Learning big Gaussian Bayesian networks: partition, estimation and fusion. (English) Zbl 07307465

Summary: Structure learning of Bayesian networks has always been a challenging problem. Nowadays, massive-size networks with thousands or more of nodes but fewer samples frequently appear in many areas. We develop a divide-and-conquer framework, called partition-estimation-fusion (PEF), for structure learning of such big networks. The proposed method first partitions nodes into clusters, then learns a subgraph on each cluster of nodes, and finally fuses all learned subgraphs into one Bayesian network. The PEF method is designed in a flexible way so that any structure learning method may be used in the second step to learn a subgraph structure as either a DAG or a CPDAG. In the clustering step, we adapt hierarchical clustering to automatically choose a proper number of clusters. In the fusion step, we propose a novel hybrid method that sequentially adds edges between subgraphs. Extensive numerical experiments demonstrate the competitive performance of our PEF method, in terms of both speed and accuracy compared to existing methods. Our method can improve the accuracy of structure learning by 20% or more, while reducing running time up to two orders-of-magnitude.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

igraph; bnlearn; R; ParallelPC
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.IEEE Transactions on Information Theory, 62(1):471-487, 2016. · Zbl 1359.94047
[2] Bryon Aragam and Qing Zhou. Concave penalized estimation of sparse gaussian bayesian networks.Journal of Machine Learning Research, 16:2273-2328, 2015. · Zbl 1351.68203
[3] Bryon Aragam, Jiaying Gu, and Qing Zhou. Learning large-scale Bayesian networks with the sparsebn package.Journal of Statistical Software, 91(11):1-38, 2019.
[4] David Maxwell Chickering. Learning equivalence classes of bayesian-network structures. Journal of Machine Learning Research, 2(Feb):445-498, 2002a. · Zbl 1007.68179
[5] David Maxwell Chickering. Optimal structure identification with greedy search.Journal of Machine Learning Research, 3(Nov):507-554, 2002b. · Zbl 1084.68519
[6] David Maxwell Chickering and Christopher Meek. Finding optimal bayesian networks. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pages 94-102. Morgan Kaufmann Publishers Inc., 2002.
[7] David Maxwell Chickering, David Heckerman, and Christopher Meek. Large-sample learning of bayesian networks is np-hard.Journal of Machine Learning Research, 5(Oct): 1287-1330, 2004. · Zbl 1222.68169
[8] Peter Chin, Anup Rao, and Van Vu. Stochastic block model and community detection in sparse graphs: A spectral algorithm with optimal rate of recovery. InConference on Learning Theory, pages 391-423, 2015.
[9] Diego Colombo and Marloes H Maathuis. Order-independent constraint-based causal structure learning.Journal of Machine Learning Research, 15(1):3741-3782, 2014. · Zbl 1312.68165
[10] G´abor Cs´ardi and Tam´as Nepusz. The igraph software package for complex network research.InterJournal, Complex Systems:1695, 2006. URLhttp://igraph.org.
[11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´a. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications.Physical Review E, 84(6):066106, 2011.
[12] Dean P Foster and Edward I George. The risk inflation criterion for multiple regression. The Annals of Statistics, 22(4):1947-1975, 1994. · Zbl 0829.62066
[13] Fei Fu and Qing Zhou. Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent.Journal of the American Statistical Association, 108:288-300, 2013. · Zbl 06158343
[14] Jos´e A G´amez, Juan L Mateo, and Jos´e M Puerta. Learning bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood.Data Mining and Knowledge Discovery, 22(1-2):106-148, 2011. · Zbl 1235.68152
[15] Maxime Gasse, Alex Aussem, and Haytham Elghazel.An experimental comparison of hybrid algorithms for bayesian network structure learning. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 58-73. Springer, 2012.
[16] Dan Geiger and David Heckerman. Learning gaussian networks. InProceedings of the Tenth international conference on Uncertainty in artificial intelligence, pages 235-243. Morgan Kaufmann Publishers Inc., 1994. · Zbl 0831.68096
[17] Jiaying Gu, Fei Fu, and Qing Zhou. Penalized estimation of directed acyclic graphs from discrete data.Statistics and Computing, 29(1):161-176, 2019. · Zbl 1430.62018
[18] John A Hartigan. Consistency of single linkage for high-density clusters.Journal of the American Statistical Association, 76(374):388-394, 1981. · Zbl 0468.62053
[19] David Heckerman, Dan Geiger, and David M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data.Machine Learning, 20:197-243, 1995. · Zbl 0831.68096
[20] Markus Kalisch and Peter B¨uhlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8:613-636, 2007. · Zbl 1222.68229
[21] Markus Kalisch, Martin M¨achler, Diego Colombo, Marloes H Maathuis, and Peter B¨uhlmann. Causal inference using graphical models with the R package pcalg.Journal of Statistical Software, 47(11):1-26, 2012.
[22] Steffen L Lauritzen.Graphical models. Oxford University Press, 1996. · Zbl 0907.62001
[23] Thuc Duy Le, Tao Hoang, Jiuyong Li, Lin Liu, and Shu Hu. ParallelPC: an R package for efficient constraint based causal exploration.arXiv preprint, 1510.03042, 2015. URL http://arxiv.org/pdf/1510.03042.pdf.
[24] Yuliya Marchetti and Qing Zhou. Iterative subsampling in solution path clustering of noisy big data.Statistics and Its Interface, 9(4):415-431, 2016. · Zbl 1405.62081
[25] Radhakrishnan Nagarajan and Marco Scutari.Bayesian Networks in R with Applications in Systems Biology. Springer, New York, 2013. doi: 10.1007/978-1-4614-6446-4. ISBN 978-1-4614-6445-7, 978-1-4614-6446-4. · Zbl 1272.62005
[26] Kristian G Olesen and Anders L Madsen.Maximal prime subgraph decomposition of bayesian networks.IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 32(1):21-31, 2002.
[27] Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. · Zbl 0959.68116
[28] Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. A million variables and more: the fast greedy equivalence search algorithm for learning highdimensional graphical causal models, with an application to functional magnetic resonance images.International Journal of Data Science and Analytics, 3(2):121-129, 2017.
[29] Robert W Robinson. Counting unlabeled acyclic digraphs. InCombinatorial Mathematics V, pages 28-43. Springer, 1977. · Zbl 0376.05031
[30] Marco Scutari. Learning bayesian networks with the bnlearn R package.Journal of Statistical Software, 35(3):1-22, 2010. doi: 10.18637/jss.v035.i03.
[31] Marco Scutari. Bayesian network constraint-based structure learning algorithms: Parallel and optimized implementations in the bnlearn R package.Journal of Statistical Software, 77(2):1-20, 2017. doi: 10.18637/jss.v077.i02.
[32] Marco Scutari and Jean-Baptiste Denis.Bayesian Networks with Examples in R. Chapman and Hall, Boca Raton, 2014. ISBN 978-1-4822-2558-7, 978-1-4822-2560-0. · Zbl 1341.62025
[33] Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1):62-72, 1991.
[34] Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. Springer-Verlag, 1993. · Zbl 0806.62001
[35] Ioannis Tsamardinos, Constantin F Aliferis, and Alexander Statnikov. Time and sample efficient discovery of markov blankets and direct causal relations. InProceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 673-678. ACM, 2003.
[36] Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine Learning, 65(1):31-78, 2006.
[37] Tomas Verma and Judea Pearl. Equivalence and synthesis of causal models. InSixth Annual Conference on Uncertainty in Artificial Intelligence, pages 220-227, 1990.
[38] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440-442, 1998. · Zbl 1368.05139
[39] Jing Xiang and Seyoung Kim. A* lasso for learning a sparse bayesian network structure for continuous variables. InAdvances in Neural Information Processing Systems, pages 2418-2426, 2013.
[40] Yiping Yuan, Xiaotong Shen, Wei Pan, and Zizhuo Wang. Constrained likelihood for reconstructing a directed acyclic Gaussian graph.Biometrika, 106:109-125, 2019. · Zbl 1506.62298
[41] Yi-feng Zeng and Kim-leng Poh. Block learning bayesian network structure from data. In Fourth International Conference on Hybrid Intelligent Systems (HIS’04), pages 14-19. IEEE, 2004.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.