Bayesian model learning based on predictive entropy. (English) Zbl 1100.62031

Summary: The Bayesian paradigm has been widely acknowledged as a coherent approach to learning putative probability model structures from a finite class of candidate models. Bayesian learning is based on measuring the predictive ability of a model in terms of the corresponding marginal data distribution, which equals the expectation of the likelihood with respect to a prior distribution for the model parameters. The main controversy related to this learning method stems from the necessity of specifying proper prior distributions for all unknown parameters of a model, which ensures a complete determination of the marginal data distribution. Even for commonly used models, subjective priors may be difficult to specify precisely, and therefore, several automated learning procedures have been suggested in the literature. Here we introduce a novel Bayesian learning method based on the predictive entropy of a probability model, that can combine both subjective and objective probabilistic assessment of uncertain quantities in putative models. It is shown that our approach can avoid some of the limitations of the earlier suggested objective Bayesian methods.


62F15 Bayesian inference
62B10 Statistical aspects of information-theoretic topics
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI


[1] Aitkin, M., 1991, ”Posterior Bayes factors,” J. Roy. Statist. Soc. B53, 111–142 (with discussion). · Zbl 0800.62167
[2] Akaike, H., 1974, ”A new look at the statistical model identification,” IEEE Trans. Autom. Control 19, 716–723. · Zbl 0314.62039
[3] Akaike, H., 1978, ”A new look at the Bayes procedure,” Biometrika 65, 53–59. · Zbl 0373.62008
[4] Akaike, H., 1979, ”A Bayesian extension of the minimum AIC procedure of autoregressive model fitting,” Biometrika 66, 237–242. · Zbl 0407.62064
[5] Bayarri, M. J. and Berger, J., 1998, ”Robust Bayesian analysis of selection models,” Ann. Statist. 26, 645–659. · Zbl 0929.62058
[6] Berger, J.O. and Pericchi, L.R., 1996, ”The intrinsic Bayes factor for model selection and prediction,” J. Amer. Stat. Assoc. 91, 109–122. · Zbl 0870.62021
[7] Berger, J.O. and Bernardo, J.M., 1989, ”Estimating a product of means: Bayesian analysis with reference priors,” J. Amer. Stat. Assoc. 84, 200–207. · Zbl 0682.62018
[8] Berger, J.O. and Bernardo, J.M., 1992, ”On the development of reference priors,” in J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 35–60 (with discussion).
[9] Berger, J.O. and Mortera, J., 1999, ”Default Bayes factors for nonnested hypothesis testing,” J. Amer. Stat. Assoc. 94, 542–554. · Zbl 0996.62018
[10] Bernardo, J.M., 1979, ”Reference posterior distributions for Bayesian inference,” J. Roy. Statist. Soc. B 41, 113–147 (with discussion). · Zbl 0428.62004
[11] Bernardo, J.M., 1999, ”Nested hypothesis testing: The Bayesian reference criterion,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 101–130 (with discussion). · Zbl 0973.62019
[12] Bernardo, J.M. and Rueda, R., 2002, ”Bayesian hypothesis testing: A reference approach,” Int. Stat. Review 70, 351–372. · Zbl 1211.62011
[13] Bernardo, J.M. and Smith, A.F.M., 1994, Bayesian Theory, Chichester: Wiley.
[14] Corander, J., 2003a, ”Bayesian graphical model determination using decision theory,” J. Multiv. Analysis 85, 253–266. · Zbl 1016.62004
[15] Corander, J., 2003b, ”Labeled graphical models,” Scand. J. Stat. 30, 493–508. · Zbl 1034.62049
[16] Corander, J., Gyllenberg, M., and Koski, T., 2005, ”Bayesian unsupervised classification algorithms based on parallel search strategy,” Patt. Recog. (under revision). · Zbl 1231.62031
[17] Dawid, A.P., 1984, ”Present position and potential developments: Some personal views. Statistical theory. The prequential approach,” J. Roy. Statist. Soc. A47, 278–292 (with discussion). · Zbl 0557.62080
[18] Engel, Y., Mannor, S., and Meir, R., 2003, ”Bayes meets Bellman: The Gaussian process approach to temporal difference learning,” in T. Fawcett and N. Mishra (eds.), Proceedings of the 20th International Conference on Machine Learning, Washington D.C.: AAAI Press.
[19] de Finetti, B., 1974, Theory of Probability I, Chichester: Wiley. · Zbl 0328.60002
[20] Giudici, P. and Green, P.J., 1999, ”Decomposable graphical Gaussian model determination,” Biometrika 86, 785–801. · Zbl 0940.62019
[21] Gutiérrez-Peña, E. and Walker, S.G., 2001, ”A Bayesian predictive approach to model selection,” J. Statist. Planning Inference 93, 259–276. · Zbl 1072.62537
[22] Hannan, E.J. and Quinn, B.G., 1979, ”The determination of the order of an autoregression”, J. Roy. Statist. Soc. B41, 190–195. · Zbl 0408.62076
[23] Jordan, M., 2004, Graphical models,” Stat. Sci. 19, 140–155. · Zbl 1057.62001
[24] Kass, R. and Wasserman, L., 1996, ”The selection of prior distributions by formal rules,” J. Amer. Stat. Assoc. 91, 1343–1370. · Zbl 0884.62007
[25] Key, J.T, Pericchi, L.R., and Smith, A.F.M., 1999, ”Bayesian model choice: What and why?” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 6, Oxford: Oxford University Press, pp. 343–370 (with discussion). · Zbl 0956.62007
[26] Lauritzen, S.L., 1996, Graphical Models, Oxford: Oxford University Press. · Zbl 0907.62001
[27] Lindley, D., 1991, ”Discussion of paper by M. Aitkin,” J. Roy. Statist. Soc. B53, 111–142 (with discussion).
[28] Lindley, D., 1992, ”Discussion of paper by R. Royall,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
[29] Lindsey, J.K., 1996, Parametric Statistical Inference, Oxford: Oxford University Press.
[30] Madigan, D. and Raftery, A.E., 1994, ”Model selection and accounting for model uncertainty in graphical models using Occam’s window,” J. Amer. Stat. Assoc. 89, 1535–1546. · Zbl 0814.62030
[31] Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, Multivariate Analysis, London: Academic Press.
[32] Meir, R. and Merhav, N., 1995, ”On the stochastic complexity of learning realizable and unrealizable rules,” Machine Learning 19, 241–261. · Zbl 0830.68109
[33] O’Hagan, A., 1995, ”Fractional Bayes factors for model comparison,” J. Roy. Statist. Soc. B57, 99–138 (with discussion).
[34] Perez, J.M. and Berger, J., 2002, ”Expected posterior prior distributions for model selection,” Biometrika 89, 491–512. · Zbl 1036.62026
[35] Porteous, B.T., 1985, ”Improved likelihood ratio statistics for covariance selection models,” Biometrika 72, 97–101. · Zbl 0605.62019
[36] Rissanen, J., 1987, ”Stochastic complexity,” J. Roy. Statist. Soc. B49, 223–239. · Zbl 0654.62008
[37] Rissanen, J., 1995, ”Fisher information and stochastic complexity,” IEEE Trans. Inf. Theory 42, 40–47. · Zbl 0856.94006
[38] Robert, C.P. and Casella, G., 1999, Monte Carlo Statistical Methods, New York: Springer.
[39] Royall, R., 1992, ”The elusive concept of statistical evidence,” in J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (eds.), Bayesian Statistics 4, Oxford: Oxford University Press, pp. 405–418 (with discussion).
[40] Schervish, M.J., 1995, Theory of Statistics, New York: Springer-Verlag. · Zbl 0834.62002
[41] Schwarz, G., 1978, ”Estimating the dimension of a model,” Ann. Stat. 6, 461–464. · Zbl 0379.62005
[42] Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A., 2002, ”Bayesian measures of model complexity and fit,” J. Roy. Statist. Soc. B64, 583–640 (with discussion). · Zbl 1067.62010
[43] Weissman, T. and Merhav, N., 2003, ”On competitive predictability and its relation to rate-distortion theory and to channel capacity theory,” IEEE Trans. Inform. Theory 49, 3185–3194. · Zbl 1245.94060
[44] Zellner, A., 1971, An Introduction to Bayesian Inference in Econometrics, New York: Wiley. · Zbl 0246.62098
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.