zbMATH — the first resource for mathematics

Markov chain Monte Carlo sampling using a reservoir method. (English) Zbl 07080186
Summary: Markov chain Monte Carlo methods are widely used to draw a sample from a target distribution which is hard to characterize analytically, and reservoir sampling is developed to obtain a sample from a data stream sequentially in a single pass. A stochastic thinning algorithm using reservoir sampling is proposed, and it can be embedded in most Markov chain Monte Carlo methods to reduce the autocorrelation among the generated sample. The distribution of the sample generated by the proposed sampling algorithm converges in total variation to the target distribution in probability under mild conditions. A practical method is introduced to detect the convergence of the proposed sampling algorithm. Two simulation studies are conducted to compare the proposed sampling algorithm and the corresponding Markov chain Monte Carlo methods without thinning, and results show that the estimation bias of the proposed sampling algorithm is approximately the same as the corresponding Markov chain Monte Carlo method, but the proposed sampling algorithm has a smaller Monte Carlo variance. The proposed sampling algorithm saves computer memory in the sense that the storage of a small portion of the Markov chain is required in each iteration.
MSC:
 62 Statistics
BayesDA
Full Text:
References:
 [1] Al Hasan, M.; Dave, V. S., Triangle counting in large networks: a review, Wiley Interdiscip. Rev.: Data Min. Knowl. Discov., 8, 2, Article e1226 pp., (2018) [2] Athreya, K. B.; Lahiri, S. N., Measure theory and probability theory, (Oxford Handbook of Innovation, (2006), Springer Science & Business Media: Springer Science & Business Media New York), 439-480 [3] Babcock, B.; Babu, S.; Datar, M.; Motwani, R.; Widom, J., Models and issues in data stream systems, (Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’02, (2002), ACM: ACM New York), 1-16 [4] Chib, S.; Greenberg, E., Understanding the Metropolis-Hastings algorithm, Amer. Statist., 49, 4, 327-335, (1995) [5] Edwards, R. G.; Sokal, A. D., Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm, Phys. Rev. D, 38, 6, 2009, (1988) [6] Efraimidis, P. S., Weighted random sampling over data streams, (Algorithms, Probability, Networks, and Games, (2015), Springer), 183-195 [7] Efraimidis, P. S.; Spirakis, P. G., Weighted random sampling with a reservoir, Inform. Process. Lett., 97, 5, 181-185, (2006) · Zbl 1184.68620 [8] Gelfand, A. E.; Smith, A. F.M., Sampling-based approaches to calculating marginal densities, J. Amer. Statist. Assoc., 85, 410, 398-409, (1990) · Zbl 0702.62020 [9] Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B., Bayesian Data Analysis, (2014), Taylor & Francis: Taylor & Francis Florida · Zbl 1279.62004 [10] Gelman, A.; Rubin, D. B., Inference from iterative simulation using multiple sequences, Statist. Sci., 7, 4, 457-472, (1992) · Zbl 1386.65060 [11] Geman, S.; Geman, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6, 6, 721-741, (1984) · Zbl 0573.62030 [12] Geyer, C. J., Practical Markov chain Monte Carlo, Stat. Sci., 7, 473-483, (1992) [13] Gilks, W. R.; Best, N. G.; Tan, K. K.C., Adaptive rejection metropolis sampling within Gibbs sampling, J. R. Stat. Soc. Ser. C, 44, 4, 455-472, (1995) · Zbl 0893.62110 [14] Gilks, W. R.; Neal, R. M.; Best, N. G.; Tan, K. K.C., Corrigendum: adaptive rejection Metropolis sampling, J. R. Stat. Soc. Ser. C, 46, 4, 541-542, (1997) [15] Gilks, W. R.; Roberts, G. O.; George, E. I., Adaptive direction sampling, J. R. Stat. Soc., Ser. D Stat., 43, 1, 179-189, (1994) [16] Girolami, M.; Calderhead, B., Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B, 73, 2, 123-214, (2011) · Zbl 1411.62071 [17] Gjoka, M.; Kurant, M.; Butts, C. T.; Markopoulou, A., Walking in facebook: A case study of unbiased sampling of OSNs, (Infocom, 2010 Proceedings IEEE, (2010), IEEE), 1-9 [18] Hastings, W. K., Monte Carlo Sampling methods using Markov chains and their applications, Biometrika, 57, 1, 97-109, (1970) · Zbl 0219.65008 [19] Leskovec, J.; Faloutsos, C., Sampling from large graphs, (Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2006), ACM), 631-636 [20] Liang, F.; Liu, C.; Carroll, R., Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples, Vol. 714, (2011), John Wiley & Sons: John Wiley & Sons Chichester [21] Link, W. A.; Eaton, M. J., On thinning of chains in MCMC, Methods Ecol. Evol., 3, 1, 112-115, (2012) [22] Liu, J. S.; Liang, F.; Wong, W. H., The multiple-try method and local optimization in Metropolis sampling, J. Amer. Statist. Assoc., 95, 449, 121-134, (2000) · Zbl 1072.65505 [23] MacEachern, S. N.; Berliner, L. M., Subsampling the Gibbs sampler, Amer. Statist., 48, 3, 188-190, (1994) [24] Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E., Equation of state calculations by fast computing machines, J. Chem. Phys., 21, 6, 1087-1092, (1953) [25] Neal, R. M., Slice sampling, Ann. Statist., 31, 705-741, (2003) · Zbl 1051.65007 [26] Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C., 2018. GEMSEC: Graph Embedding with Self Clustering, arXiv:arXiv:1802.03997. [27] Swendsen, R. H.; Wang, J.-S., Nonuniversal critical dynamics in Monte Carlo simulations, Phys. Rev. Lett., 58, 2, 86, (1987) [28] Tierney, L., Markov Chains for exploring posterior distributions, Ann. Statist., 22, 2, 1701-1728, (1994) · Zbl 0829.62080 [29] Vitter, J. S., Faster methods for random sampling, Commun. ACM., 27, 7, 703-718, (1984) · Zbl 0595.65008 [30] Vitter, J. S., Random sampling with a reservoir, ACM Trans. Math. Software, 11, 1, 37-57, (1985) · Zbl 0562.68028 [31] Wu, F.-Y., The potts model, Rev. Modern Phys., 54, 1, 235, (1982)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.