×

A case study of the widely applicable Bayesian information criterion and its optimality. (English) Zbl 1332.62099

Summary: In Bayesian statistics, the marginal likelihood (evidence) is one of the key factors that can be used as a measure of model goodness. However, for many practical model families it cannot be computed analytically. An alternative solution is to use some approximation method or time-consuming sampling method. The widely applicable Bayesian information criterion (WBIC) was developed recently to have a marginal likelihood approximation that works also with singular models. The central idea of the approximation is to select a single thermodynamic integration term (power posterior) with the (approximated) optimal temperature \(\beta^*=1/\log (n)\), where \(n\) is the data size. We apply this new approximation to the analytically solvable Gaussian process regression case to show that the optimal temperature may depend also on data itself or other variables, such as the noise level. Moreover, we show that the steepness of a thermodynamic curve at the optimal temperature indicates the magnitude of the error that WBIC makes.

MSC:

62F15 Bayesian inference
62G08 Nonparametric regression and quantile regression

Software:

GPstuff
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Calderhead, B., Girolami, M.: Estimating Bayes factors via thermodynamic integration and population MCMC. Comput. Stat. Data Anal. 53, 4028-4045 (2009) · Zbl 1453.62055 · doi:10.1016/j.csda.2009.07.025
[2] Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47, 547-553 (2009) · doi:10.1016/j.dss.2009.05.016
[3] Filippone, M.: Bayesian inference for Gaussian process classifiers with annealing and exact-approximate MCMC. arXiv:1311.7320 (2013) · Zbl 1320.62010
[4] Filippone, M., Zhong, M., Girolami, M.: A comparative evaluation of stochastic-based inference methods for Gaussian process models. Mach. Learn. 93, 93-114 (2012) · Zbl 1294.62048 · doi:10.1007/s10994-013-5388-x
[5] Friel, N., Pettitt, A.N.: Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Series B (Stat. Methodol.) 70, 589-607 (2008) · Zbl 05563360 · doi:10.1111/j.1467-9868.2007.00650.x
[6] Friel, N., Hurn, M., Wyse, J.: Improving power posterior estimation of statistical evidence. Stat. Comput. (2013). doi:10.1007/s11222-013-9397-1 · Zbl 1322.62098
[7] Kuss, M., Rasmussen, C.E.: Assessing approximate inference for binary Gaussian process classification. J. Mach. Learn. Res. 6, 1679-1704 (2005) · Zbl 1190.62119
[8] Murray, I., Adams, R.P.: Slice sampling covariance hyperparameters of latent Gaussian models. Adv. Neural Inf. Process. Syst. 23, 1723-1731 (2010)
[9] Nickisch, H., Rasmussen, C.E.: Approximations for binary Gaussian process classification. J. Mach. Learn. Res. 9, 2035-2078 (2008) · Zbl 1225.62087
[10] Petersen, K.B., Pedersen, M.S.: The Matrix Cookbook. Version: November 14 (2008) · Zbl 1320.62058
[11] Quiñonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939-1959 (2005) · Zbl 1222.68282
[12] Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA (2006) · Zbl 1177.68165
[13] Roos, T., Zou, Y.: Keep it simple stupid—On the effect of lower-order terms in BIC-like criteria. Information Theory and Applications Workshop, February 2013, San Diego, USA (2013) · Zbl 1190.62119
[14] Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Series B (Stat. Methodol.) 71, 319-392 (2009) · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[15] Vanhatalo, J., et al.: GPstuff: Bayesian modeling with Gaussian processes. J. Mach. Learn. Res. 14, 1175-1179 (2013) · Zbl 1320.62010
[16] Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge (2009) · Zbl 1180.93108 · doi:10.1017/CBO9780511800474
[17] Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14, 867-897 (2013) · Zbl 1320.62058
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.