×

Accelerated dimension-independent adaptive metropolis. (English) Zbl 1352.65029

Summary: This work describes improvements by algorithmic and architectural means to black-box Bayesian inference over high-dimensional parameter spaces. The well-known adaptive Metropolis (AM) algorithm [H. Haario et al., Bernoulli 7, No. 2, 223–242 (2001; Zbl 0989.65004)] is extended herein to scale asymptotically uniformly with respect to the underlying parameter dimension for Gaussian targets, by respecting the variance of the target. The resulting algorithm, referred to as the dimension-independent adaptive Metropolis algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional (with dimension \(d \geq 1000\)) targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptotically in dimension, this GPU implementation exhibits a factor of four improvement versus a competitive CPU-based Intel MKL (math kernel library) parallel version alone. Strong scaling to concurrent chains is exhibited, through a combination of longer time per sample batch (weak scaling) with fewer necessary samples to convergence. The algorithm performance is illustrated on several Gaussian and non-Gaussian target examples, in which the dimension may be in excess of one thousand.

MSC:

65C60 Computational problems in statistics (MSC2010)
62F15 Bayesian inference
65C05 Monte Carlo methods
65C40 Numerical analysis or methods applied to Markov chains
60J22 Computational methods in Markov chains
65Y20 Complexity and performance of numerical algorithms
65Y05 Parallel numerical computation

Citations:

Zbl 0989.65004
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] A. Abdelfattah, H. Ltaief, and D. Keyes, {\it KBLAS: An optimized library for dense matrix-vector multiplication on GPU accelerators}, ACM Trans. Math. Softw., 42 (2016), 18, ŭlhttps://doi.org/10.1145/2818311 doi:10.1145/2818311. · Zbl 1369.65042
[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, {\it LAPACK User’s Guide}, 3rd ed., Software Environ. Tools 9, SIAM, Philadelphia, 1999. · Zbl 0934.65030
[3] C. Andrieu and É. Moulines, {\it On the ergodicity properties of some adaptive MCMC algorithms}, Ann. Appl. Probab., 16 (2006), pp. 1462-1505. · Zbl 1114.65001
[4] C. Andrieu and J. Thoms, {\it A tutorial on adaptive MCMC}, Statist. Comput., 18 (2008), pp. 343-373.
[5] A. Beskos, N. Pillai, G. Roberts, J.-M. Sanz-Serna, and A. Stuart, {\it Optimal tuning of the hybrid Monte Carlo algorithm}, Bernoulli, 19 (2013), pp. 1501-1534. · Zbl 1287.60090
[6] A. Beskos, G. O. Roberts, and A. M. Stuart, {\it Optimal scalings of Metropolis-Hastings algorithms for non-product targets in high dimensions}, Ann. Appl. Probab., 19 (2009), pp. 863-898. · Zbl 1172.60328
[7] A. Beskos, G. O. Roberts, A. M. Stuart, and J. Voss, {\it MCMC methods for diffusion bridges}, Stochast. Dyn., 8 (2008), pp. 319-350. · Zbl 1159.65007
[8] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. van Bloemen Waanders, K. Willcox, and Y. Marzouk, {\it Large-Scale Inverse Problems and Quantification of Uncertainty}, Wiley Ser. Comput. Statist. 712, John Wiley & Sons, New York, 2011. · Zbl 1203.62002
[9] V. I. Bogachev, {\it Gaussian Measures}, Math. Surveys Monogr. 62, American Mathematical Society, Providence, RI, 1998.
[10] S. P. Brooks and A. Gelman, {\it General methods for monitoring convergence of iterative simulations}, J. Comput. Graph. Statist., 7 (1998), pp. 434-455.
[11] T. Bui-Thanh and M. Girolami, {\it Solving large-scale PDE-constrained Bayesian inverse problems with Riemann manifold Hamiltonian Monte Carlo}, Inverse Problems, 30 (2014), 114014. · Zbl 1306.65269
[12] B. Calderhead, {\it A general construction for parallelizing Metropolis-Hastings algorithms}, Proc. Natl. Acad. Sci. USA, 111 (2014), pp. 17408-17413.
[13] S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White, {\it MCMC methods for functions: Modifying old algorithms to make them faster}, Statist. Sci., 28 (2013), pp. 424-446. · Zbl 1331.62132
[14] R. V. Craiu and X.-L. Meng, {\it Multiprocess parallel antithetic coupling for backward and forward Markov chain Monte Carlo}, Ann. Statist., 33 (2005), pp. 661-697. · Zbl 1068.62090
[15] R. V. Craiu, J. Rosenthal, and C. Yang, {\it Learn from thy neighbor: Parallel-chain and regional adaptive MCMC}, J. Amer. Statist. Assoc., 104 (2009), pp. 1454-1466. · Zbl 1205.65028
[16] T. Cui, K. J. H. Law, and Y. M. Marzouk, {\it Dimension-independent likelihood-informed MCMC}, J. Comput. Phys., 304 (2016), pp. 109-137, doi:10.1016/j.jcp.2015.10.008. · Zbl 1349.65009
[17] T. Cui, J. Martin, Y. M. Marzouk, A. Solonen, and A. Spantini, {\it Likelihood-informed dimension reduction for nonlinear inverse problems}, Inverse Problems, 30 (2014), 114015. · Zbl 1310.62030
[18] P. Del Moral, A. Doucet, and A. Jasra, {\it Sequential Monte Carlo samplers}, J. Roy. Statist. Soc. Ser. B, 68 (2006), pp. 411-436. · Zbl 1105.62034
[19] J. Dongarra, P. Beckman, P. Aerts, F. Cappello, T. Lippert, S. Matsuoka, P. Messina, S. Moore, T. A. R., Trefethen, and M. Valero, {\it The international exascale software project: A call to cooperative action by the global high performance community}, Int. J. High Perform. Comput. Appl., 23 (2009), pp. 309-322.
[20] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, {\it LINPACK Users’ Guide}, SIAM, Philadelphia, 1979.
[21] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, {\it Hybrid Monte Carlo}, Phys. Lett. B, 195 (1987), pp. 216-222.
[22] H. P. Flath, L. C. Wilcox, V. Akçelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, {\it Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations}, SIAM J. Sci. Comput., 33 (2011), pp. 407-432. · Zbl 1229.65174
[23] G. Fort, E. Moulines, and P. Priouret, {\it Convergence of adaptive and interacting Markov chain Monte Carlo algorithms}, Ann. Statist., 39 (2011), pp. 3262-3289. · Zbl 1246.65003
[24] G. Fort, E. Moulines, P. Priouret, and P. Vandekerkhove, {\it A central limit theorem for adaptive and interacting Markov chains}, Bernoulli, 20 (2014), pp. 457-485. · Zbl 1303.60020
[25] A. Gelman, G. Roberts, and W. Gilks, {\it Efficient metropolis jumping rules}, Bayesian Statist., 5 (1996), pp. 599-608.
[26] A. Gelman and D. B. Rubin, {\it Inference from iterative simulation using multiple sequences}, Statist. Sci., 7 (1992), pp. 457-472. · Zbl 1386.65060
[27] C. J. Geyer, {\it Markov chain Monte Carlo maximum likelihood}, in Computer Science and Statistics: Proceedings of the 23rd Symposium on the Interface, Interface Foundation, Fairfax Station, VA, 1991, pp. 156-163.
[28] S. Ghosal, J. K. Ghosh, and A. W. Van Der Vaart, {\it Convergence rates of posterior distributions}, Ann. Statist., 28 (2000), pp. 500-531. · Zbl 1105.62315
[29] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, {\it Markov Chain Monte Carlo in Practice}, Chapman Hall/CRC, New York, 1996. · Zbl 0832.00018
[30] W. R. Gilks, {\it Markov Chain Monte Carlo}, Wiley Online Library, 2005, .
[31] M. Girolami and B. Calderhead, {\it Riemann manifold Langevin and Hamiltonian Monte Carlo methods}, J. Roy. Statist. Soc. Ser. B, 73 (2011), pp. 123-214. · Zbl 1411.62071
[32] H. Haario, M. Laine, A. Mira, and E. Saksman, {\it DRAM: Efficient adaptive MCMC}, Statist. and Comput., 16 (2006), pp. 339-354.
[33] H. Haario, E. Saksman, and J. Tamminen, {\it An adaptive Metropolis algorithm}, Bernoulli, 7 (2001), pp. 223-242. · Zbl 0989.65004
[34] H. Haario, E. Saksman, and J. Tamminen, {\it Componentwise adaptation for high dimensional MCMC}, Computat. Statist., 20 (2005), pp. 265-273. · Zbl 1091.65009
[35] W. K. Hastings, {\it Monte Carlo sampling methods using Markov chains and their applications}, Biometrika, 57 (1970), pp. 97-109. · Zbl 0219.65008
[36] W. K. Hastings, {\it Monte Carlo sampling methods using Markov chains and their applications}, Biometrika, 57 (1970), pp. 97-109. · Zbl 0219.65008
[37] K. Hukushima and K. Nemoto, {\it Exchange Monte Carlo method and application to spin glass simulations}, Journal of the Physical Society of Japan, 65 (1996), pp. 1604-1608.
[38] Intel, {\it Math Kernel Library}; .
[39] P. Jacob, C. P. Robert, and M. H. Smith, {\it Using parallel computation to improve independent Metropolis-Hastings based estimation}, J. Comput. Graph. Statist., 20 (2011), pp. 616-635.
[40] J. P. Kaipio and E. Somersalo, {\it Statistical inversion and Monte Carlo sampling methods in electrical impedance tomography}, Inverse Problems, 16 (2000), pp. 1487-1522. · Zbl 1044.78513
[41] P. E. Kloeden and E. Platen, {\it Numerical Solution of Stochastic Differential Equations}, Appl. Math. 23, Springer-Verlag, Berlin, 1992. · Zbl 0752.60043
[42] A. Korattikara, Y. Chen, and M. Welling, {\it Austerity in MCMC land: Cutting the Metropolis-Hastings budget}, in Proceedings of the 31st International Conference on Machine Learning, J. Machine Learning Res., 32 (2014).
[43] S. C. Kou, Q. Zhou, and W. H. Wong, {\it Discussion paper equi-energy sampler with applications in statistical inference and statistical mechanics}, Ann. Statist., (2006), pp. 1581-1619. · Zbl 1246.82054
[44] S. Kullback and R. A. Leibler, {\it On information and sufficiency}, Ann. Math. Stat., 22 (1951), pp. 79-86. · Zbl 0042.38403
[45] K. J. H. Law, {\it Proposals which speed up function-space MCMC}, J. Comput. Appl. Math., 262 (2014), pp. 127-138. · Zbl 1301.65004
[46] O. P. Le Ma\^\itre and O. M. Knio, {\it Spectral Methods for Uncertainty Quantification: With Applications to Computational Fluid Dynamics}, Springer Science & Business Media, New York, 2010. · Zbl 1193.76003
[47] A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes, {\it On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods}, J. Comput. Graph. Statist., 19 (2010), pp. 769-789.
[48] D. Maclaurin and R. P. Adams, {\it Firefly Monte Carlo: Exact MCMC with subsets of data}, in Proceedings of the International Joint Conference on Artificial Intelligence, 2015, .
[49] J. Martin, L. C. Wilcox, C. Burstedde, and O. Ghattas, {\it A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion}, SIAM J. Sci. Comput., 34 (2012), pp. A1460-A1487. · Zbl 1250.65011
[50] B. Matérn et al., {\it Spatial Variation. Stochastic Models and Their Application to Some Problems in Forest Surveys and Other Sampling Investigations}, Meddelanden fran statens Skogsforskningsinstitut, 49 (1960), 5.
[51] J. C. Mattingly, N. S. Pillai, and A. M. Stuart, {\it SPDE limits of the random walk Metropolis algorithm in high dimensions}, Ann. Appl. Probab., 22 (2012), pp. 881-930. · Zbl 1254.60081
[52] Message Passing Interface Forum, {\it MPI: A message passing interface}, in Proceedings of Supercomputing ’93, IEEE Computer Society Press, Piscataway, NJ, 1993, pp. 878-883.
[53] N. Metropolis, R. W. Rosenbluth, M. N. Teller, and E. Teller, {\it Equations of state calculations by fast computing machines}, J. Chem. Phys., 21 (1953), pp. 1087-1092. · Zbl 1431.65006
[54] S. P. Meyn and R. L. Tweedie, {\it Markov Chains and Stochastic Stability}, Communications and Control Engineering Series, Springer-Verlag, London, 1993. · Zbl 0925.60001
[55] S. Minsker, S. Srivastava, L. Lin, and D. B. Dunson, {\it Robust and Scalable Bayes via a Median of Subset Posterior Measures}, preprint, , 2014. · Zbl 1442.62056
[56] R. M. Neal, {\it Bayesian Learning for Neural Networks}, Ph.D. Thesis, Department of Computer Science, University of Toronto, Toronto, ON, 1994. · Zbl 0888.62021
[57] NVIDIA, {\it CUDA: Compute Unified Device Architecture – Parallel Computing Platform and Programming Model}, .
[58] NVIDIA, {\it The CUDA Basic Linear Algebra Subroutines (cuBLAS)}, .
[59] B. Øksendal, {\it Stochastic Differential Equations}, 5th ed., Universitext, Springer-Verlag, Berlin, 1998.
[60] N. S. Pillai, A. M. Stuart, and A. H. Thiery, {\it Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions}, Ann. Appl. Probab., 22 (2012), pp. 2320-2356. · Zbl 1272.60053
[61] C. P. Robert and G. C. Casella, {\it Monte Carlo Statistical Methods}, Springer Texts Statist., Springer-Verlag, Berlin, 1999. · Zbl 0935.62005
[62] G. O. Roberts, A. Gelman, and W. R. Gilks, {\it Weak convergence and optimal scaling of random walk Metropolis algorithms}, Ann. Appl. Probab., 7 (1997), pp. 110-120. · Zbl 0876.60015
[63] G. O. Roberts and J. Rosenthal, {\it Optimal scaling of discrete approximations to Langevin diffusions}, J. Roy. Statist. Soc. Ser. B, 60 (1998), pp. 255-268. · Zbl 0913.60060
[64] G. O. Roberts and J. Rosenthal, {\it Optimal scaling for various Metropolis-Hastings algorithms}, Statist. Sci., 16 (2001), pp. 351-367. · Zbl 1127.65305
[65] G. O. Roberts and J. S. Rosenthal, {\it Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms}, J. Appl. Probab., 44 (2007), pp. 458-475. · Zbl 1137.62015
[66] G. O. Roberts and R. L. Tweedie, {\it Exponential convergence of Langevin distributions and their discrete approximations}, Bernoulli, 2 (1996), pp. 341-363. · Zbl 0870.60027
[67] P. J. Rossky, J. D. Doll, and H. L. Friedman, {\it Brownian dynamics as smart Monte Carlo simulation}, J. Chem. Phys., 69 (1978), pp. 4628-4633.
[68] E. Saksman and M. Vihola, {\it On the ergodicity of the adaptive Metropolis algorithm on unbounded domains}, Ann. Appl. Probab., 20 (2010), pp. 2178-2203. · Zbl 1209.65004
[69] C. Schillings and Ch. Schwab, {\it Scaling Limits in Computational Bayesian Inversion}, Technical Report 2014-26, Seminar for Applied Mathematics, ETH Zürich, Switzerland, 2014. · Zbl 1358.65013
[70] S. L. Scott, A. W. Blocker, F. V. Bonassi, H. Chipman, E. George, and R. McCulloch, {\it Bayes and big data: The consensus Monte Carlo algorithm}, Int. J. Management Sci. and Engrg. Management, 11 (2016), pp. 78-88.
[71] A. Solonen, P. Ollinaho, M. Laine, H. Haario, J. Tamminen, and H. Järvinen, {\it Efficient MCMC for climate model parameter estimation: Parallel adaptive chains and early rejection}, Bayesian Anal., 7 (2012), pp. 715-736. · Zbl 1330.60091
[72] I. Strid, {\it Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach}, Comput. Statist. Data Anal., 54 (2010), pp. 2814-2835. · Zbl 1284.62036
[73] A. M. Stuart, {\it Inverse problems: A Bayesian approach}, Acta Numer., 19 (2010), pp. 451-559. · Zbl 1242.65142
[74] M. A. Suchard, Q. Wang, C. Chan, J. Frelinger, A. Cron, and M. West, {\it Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures}, J. Comput. Graph. Statist., 19 (2010), pp. 419-438.
[75] A. Tarantola, {\it Inverse Problem Theory and Methods for Model Parameter Estimation}, SIAM, Philadelphia, 2005. · Zbl 1074.65013
[76] L. Tierney, {\it A note on Metropolis-Hastings kernels for general state spaces}, Ann. Appl. Probab., 8 (1998), pp. 1-9. · Zbl 0935.60053
[77] M. Vihola, {\it Robust adaptive Metropolis algorithm with coerced acceptance rate}, Statist. Comput., 22 (2012), pp. 997-1008. · Zbl 1252.65024
[78] S. J. Vollmer, {\it Dimension-independent MCMC sampling for inverse problems with non-Gaussian priors}, SIAM/ASA J. Uncertainty Quant., 3 (2015), pp. 535-561. · Zbl 1322.65008
[79] P. Whittle, {\it Prediction and Regulation by Linear Least-Square Methods}, English University Press, London, 1963.
[80] D. J. Wilkinson, {\it Parallel Bayesian computation}, Statist. Textbooks Monogr., 184 (2006), p. 477.
[81] A. YarKhan, J. Kurzak, and J. Dongarra, {\it QUARK Users’ Guide: QUeueing And Runtime for Kernels}, University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02, Knoxville, TN, 2011.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.