×

Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. (English) Zbl 1101.68710

Summary: There is a commonly held opinion that the algorithms for learning unrestricted types of Bayesian networks, especially those based on the score+search paradigm, are not suitable for building competitive Bayesian network-based classifiers. Several specialized algorithms that carry out the search into different types of Directed Acyclic Graph (DAG) topologies have since been developed, most of these being extensions (using augmenting arcs) or modifications of the Naive Bayes basic topology. In this paper, we present a new algorithm to induce classifiers based on Bayesian networks which obtains excellent results even when standard scoring functions are used. The method performs a simple local search in a space unlike unrestricted or augmented DAGs. Our search space consists of a type of partially directed acyclic graph which combines two concepts of DAG equivalence: classification equivalence and independence equivalence. The results of exhaustive experimentation indicate that the proposed method can compete with state-of-the-art algorithms for classification.

MSC:

68R10 Graph theory (including graph drawing) in computer science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Acid, S., & de Campos, L.M. (2001). A hybrid methodology for learning belief networks: Benedict. International Journal of Approximate Reasoning, 27, 235–262. · Zbl 0983.68197
[2] Acid, S., & de Campos, L.M. (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. Journal of Artificial Intelligence Research, 18, 445–490. · Zbl 1056.68142
[3] Aliferis, C. F., Tsamardinos, I., & Statnikov, A. (2003). HITON: A novel Markov blanket algorithm for optimal variable selection. In Proceedings of the 2003 American Medical Informatics Association (pp. 21–25).
[4] Blake, C. L., & Merz, C. J. (1998). UCI Repository of machine learning databases. http://www.ics.uci.edu/learn/MLRepository.html, University of California, Irvine, Dept. of Information and Computer Sciences.
[5] Cheng, J., & Greiner, R. (1999). Comparing Bayesian network classifiers. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 101–108).
[6] Cheng, J., Greiner, R., Kelly, J., Bell, D. A., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137, 43–90. · Zbl 0995.68114
[7] Chickering, D. M. (1996). Learning equivalence classes of Bayesian network structures. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 150–157).
[8] Chickering, D. M. (2002). Learning equivalence classes of Bayesian network structures. Journal of Machine Learning Research, 2, 445–498. · Zbl 1007.68179
[9] Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467. · Zbl 0165.22305
[10] Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–348. · Zbl 0766.68109
[11] de Campos, L.M., & Huete, J.F. (2000). A new approach for learning belief networks using independence criteria. International Journal of Approximate Reasoning, 24, 11–37. · Zbl 0995.68109
[12] Domingos, P., & Pazzani, M. J. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130. · Zbl 0892.68076
[13] Elvira Consortium. (2002). Elvira: An environment for probabilistic graphical models. In Proceedings of the First European Workshop on Probabilistic Graphical Models (pp. 222–230).
[14] Ezawa, K., Singh, M., & Norton, S. (1996). Learning goal oriented Bayesian networks for telecommunications risk management. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 139–147).
[15] Fayyad, U. M., & Irani, K. B. (1993). Multi-valued interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Inteligence (pp. 1022–1027).
[16] Frey, L., Fisher, D., Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003). Identifying Markov blankets with decision tree induction. In Proceedings of the Third IEEE International Conference on Data Mining (pp. 59–66).
[17] Friedman, N., Geiger, D.,& Goldszmidt, M. (1997).Bayesian network classifiers. Machine Learning, 29, 131–163. · Zbl 0892.68077
[18] Good, I. J. (1965). The estimation of probabilities. Cambridge: The MIT Press. · Zbl 0168.39603
[19] Hand, D. J. (1981). Discrimination and classification. Wiley. · Zbl 0587.62119
[20] Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243. · Zbl 0831.68096
[21] John, G. H., & Kohavi, R. (1994). Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 121–129).
[22] Keogh, E., & Pazzani, M. (2002). Learning augmented Bayesian classifiers. International Journal on Artificial Intelligence Tools, 11, 587–601. · Zbl 05421426
[23] Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143).
[24] Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324. · Zbl 0904.68143
[25] Kohavi, R., John, G. H., Long, R., Manley, D., & Pfleger, K. (1994). MLC++: A machine learning library in C++. In Proceedings of the Sixth International Conference on Tools with Artificial Intelligence (pp. 740–743).
[26] Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 284–292).
[27] Kononenko, I. (1991). Semi-naive Bayesian classifier. In Proceedings of the Second International Conference on Knowledge Discovery in Databases (pp. 206–219).
[28] Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks. An approach based on the MDL principle. Computational Intelligence, 10, 269–293.
[29] Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the National Conference on Artificial Intelligence (pp. 223–228).
[30] Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406).
[31] Lucas, P. (2002). Restricted Bayesian network structure learning. In Proceedings of the First European Workshop on Probabilistic Graphical Models (pp. 117–126).
[32] Pazzani, M. J. (1995). Searching for dependencies in Bayesian classifiers. Lecture Notes in Statistics, 112, 239–248.
[33] Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco, CA: Morgan Kaufmann. · Zbl 0649.68104
[34] Pearl, J., & Verma, T.S. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (pp. 220–227).
[35] Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 335–338).
[36] Sierra, B. & Larrañaga, P. (1998). Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artificial Intelligence in Medicine, 14, 215–230. · Zbl 05391106
[37] Singh, M., & Provan, G. M. (1996). Efficient learning of selective Bayesian network classifiers. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 453–461).
[38] Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction and Search. Lecture Notes in Statistics 81. New York: Springer Verlag. · Zbl 0806.62001
[39] Zhang, N. L., Nielsen, T. D., & Jensen, F. V. (2004). Latent variable discovery in classification models. Artificial Intelligence in Medicine, 30, 283–299. · Zbl 05391210
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.