×

Learning Bayesian network parameters via minimax algorithm. (English) Zbl 1456.68146

Summary: Parameter learning is an important aspect of learning in Bayesian networks. Although the maximum likelihood algorithm is often effective, it suffers from overfitting when there is insufficient data. To address this, prior distributions of model parameters are often imposed. When training a Bayesian network, the parameters of the network are optimized to fit the data. However, imposing prior distributions can reduce the fitness between parameters and data. Therefore, a trade-off is needed between fitting and overfitting. In this study, a new algorithm, named MiniMax Fitness (MMF) is developed to address this problem. The method includes three main steps. First, the maximum a posterior estimation that combines data and prior distribution is derived. Then, the hyper-parameters of the prior distribution are optimized to minimize the fitness between posterior estimation and data. Finally, the order of posterior estimation is checked and adjusted to match the order of the statistical counts from the data. In addition, we introduce an improved constrained maximum entropy method, named Prior Free Constrained Maximum Entropy (PF-CME), to facilitate parameter learning when domain knowledge is provided. Experiments show that the proposed methods outperforms most of existing parameter learning methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H22 Probabilistic graphical models
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Judea, P., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, California
[2] Ibrahim, W.; Beiu, V., Using Bayesian networks to accurately calculate the reliability of complementary metal oxide semiconductor gates, IEEE Trans. Reliab., 60, 3, 538-549 (2011)
[3] Mascaro, S.; Nicholso, A. E.; Korb, K. B., Anomaly detection in vessel tracks using Bayesian networks, Int. J. Approx. Reason., 55, 1, 84-98 (2014)
[4] Infantes, G.; Ghallab, M.; Ingrand, F., Learning the behavior model of a robot, Auton. Robots, 30, 2, 157-177 (2011)
[5] Tamada, Y.; Imoto, S.; Araki, H.; Nagasaki, M.; Print, C.; Charnock-Jones, D. S.; Miyano, S., Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers, IEEE/ACM Trans. Comput. Biol. Bioinform., 8, 3, 683-697 (2011)
[6] Landuyt, D.; Broekx, S.; D’hondt, R.; Engelen, G.; Aertsens, J.; Goethals, P. L., A review of Bayesian belief networks in ecosystem service modelling, Environ. Model. Softw., 46, 1-11 (2013)
[7] Wachowski, N.; Azimi-Sadjadi, M. R., Detection and classification of nonstationary transient signals using sparse approximations and Bayesian networks, IEEE/ACM Trans. Audio Speech Lang. Process., 22, 12, 1750-1764 (2014)
[8] Almond, R. G.; Mislevy, R. J.; Steinberg, L. S.; Yan, D.; Williamson, D. M., Bayesian Networks in Educational Assessment (2015), Springer
[9] Redner, R. A.; Walker, H. F., Mixture densities, maximum likelihood and the em algorithm, SIAM Rev., 26, 2, 195-239 (1984) · Zbl 0536.62021
[10] Isozaki, T.; Kato, N.; Ueno, M., “Data temperature” in minimum free energies for parameter learning of Bayesian networks, Int. J. Artif. Intell. Tools, 18, 05, 653-671 (2009)
[11] Hu, J.; Tang, X.; Qiu, J., A Bayesian network approach for predicting seismic liquefaction based on interpretive structural modeling, Georisk: assessment and management of risk for, Eng. Syst. Geohazards, 9, 3, 200-217 (2015)
[12] Constantinou, A. C.; Freestone, M.; Marsh, W.; Fenton, N.; Coid, J., Risk assessment and risk management of violent reoffending among prisoners, Expert Syst. Appl., 42, 21, 7511-7529 (2015)
[13] Seixas, F. L.; Zadrozny, B.; Laks, J.; Conci, A.; Saade, D. C.M., A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer’s disease and mild cognitive impairment, Comput. Biol. Med., 51, 140-158 (2014)
[14] Cooper, G. F.; Herskovits, E., A Bayesian method for the induction of probabilistic networks from data, Mach. Learn., 9, 4, 309-347 (1992) · Zbl 0766.68109
[15] Clarke, B. S.; Barron, A. R., Jeffreys’ prior is asymptotically least favorable under entropy risk, J. Stat. Plan. Inference, 41, 1, 37-60 (1994) · Zbl 0820.62006
[16] Suzuki, J., Learning Bayesian belief networks based on the minimum description length principle: basic properties, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 82, 10, 2237-2245 (1999)
[17] Wittig, F.; Jameson, A., Exploiting qualitative knowledge in the learning of conditional probabilities of Bayesian networks, (Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (2000), Morgan Kaufmann), 644-652
[18] Altendorf, E. E.; Restificar, A. C.; Dietterich, T. G., Learning from sparse data by exploiting monotonicity constraints, (Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005), AUAI Press), 18-26
[19] Zhou, Y.; Fenton, N.; Zhu, C., An empirical study of Bayesian network parameter learning with monotonic influence constraints, Decis. Support Syst., 87, 69-79 (2016)
[20] Feelders, A.; Van der Gaag, L. C., Learning Bayesian network parameters under order constraints, Int. J. Approx. Reason., 42, 1-2, 37-53 (2006) · Zbl 1096.68697
[21] De Campos, C. P.; Tong, Y.; Ji, Q., Constrained maximum likelihood learning of Bayesian networks for facial action recognition, (Proceedings of the 10th European Conference on Computer Vision (2008), Springer), 168-181
[22] De Campos, C. P.; Ji, Q., Improving Bayesian network parameter learning using constraints, (Proceedings of the 19th International Conference on Pattern Recognition (2008), IEEE), 1-4
[23] Chang, R.; Wang, W., Novel algorithm for Bayesian network parameter learning with informative prior constraints, (Proceedings of the International Joint Conference on Neural Networks (2010), IEEE), 1-8
[24] Guo, Z.; Gao, X.; Ren, H.; Yang, Y.; Di, R.; Chen, D., Learning Bayesian network parameters from small data sets: a further constrained qualitatively maximum a posteriori method, Int. J. Approx. Reason., 91, 22-35 (2017) · Zbl 1419.68075
[25] Bickel, P. J., Minimax estimation of the mean of a normal distribution when the parameter space is restricted, Ann. Stat., 9, 6, 1301-1309 (1981) · Zbl 0484.62013
[26] Guohua, Z.; Alan, W., Admissible and minimax estimation of the parameter n in the binomial distribution, J. Stat. Plan. Inference, 113, 2, 451-466 (2003) · Zbl 1026.62002
[27] Eiji, T.; Manfred, W., The minimax strategy for Gaussian density estimation, (Proceedings of the 13th Annual Conference on Computer Learning Theory (2000), Morgan Kaufmann), 100-106
[28] Tomi, S.; Teemu, R.; Petri, M., Locally minimax optimal predictive modeling with Bayesian networks, (Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2009), Microtome Publishing), 504-511
[29] Spiegelhalter, D. J.; Lauritzen, S. L., Sequential updating of conditional probabilities on directed graphical structures, Networks, 20, 5, 579-605 (1990) · Zbl 0697.90045
[30] Kullback, S.; Leibler, R. A., On information and sufficiency, Ann. Math. Stat., 22, 1, 79-86 (1951) · Zbl 0042.38403
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.