×

Proximal Markov chain Monte Carlo algorithms. (English) Zbl 1505.62315

Summary: This paper presents a new Metropolis-adjusted Langevin algorithm (MALA) that uses convex analysis to simulate efficiently from high-dimensional densities that are log-concave, a class of probability distributions that is widely used in modern high-dimensional statistics and data analysis. The method is based on a new first-order approximation for Langevin diffusions that exploits log-concavity to construct Markov chains with favourable convergence properties. This approximation is closely related to Moreau-Yoshida regularisations for convex functions and uses proximity mappings instead of gradient mappings to approximate the continuous-time process. The proposed method complements existing MALA methods in two ways. First, the method is shown to have very robust stability properties and to converge geometrically for many target densities for which other MALA are not geometric, or only if the step size is sufficiently small. Second, the method can be applied to high-dimensional target densities that are not continuously differentiable, a class of distributions that is increasingly used in image processing and machine learning and that is beyond the scope of existing MALA and HMC algorithms. To use this method it is necessary to compute or to approximate efficiently the proximity mappings of the logarithm of the target density. For several popular models, including many Bayesian models used in modern signal and image processing and machine learning, this can be achieved with convex optimisation algorithms and with approximations based on proximal splitting techniques, which can be implemented in parallel. The proposed method is demonstrated on two challenging high-dimensional and non-differentiable models related to image resolution enhancement and low-rank matrix estimation that are not well addressed by existing MCMC methodology.

MSC:

62-08 Computational methods for problems pertaining to statistics
65C05 Monte Carlo methods
62F15 Bayesian inference
62H12 Estimation in multivariate analysis
90C25 Convex programming
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory

Software:

BayesDA; NESTA; UNLocBoX
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Afonso, M., Bioucas-Dias, J., Figueiredo, M.: An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems. IEEE. Trans. Image Process. 20(3), 681-695 (2011) · Zbl 1372.94004 · doi:10.1109/TIP.2010.2076294
[2] Agarwal, A., Negahban, S., Wainwright, J.M.: Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 40(5), 2452-2482 (2012) · Zbl 1373.62244 · doi:10.1214/12-AOS1032
[3] Atchade, Y.: An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift. Methodol. Comput. Appl. Probab. 8(2), 235-254 (2006) · Zbl 1104.65004 · doi:10.1007/s11009-006-8550-0
[4] Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011) · Zbl 1218.47001 · doi:10.1007/978-1-4419-9467-7
[5] Becker, S., Bobin, J., Candès, E.J.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1-39 (2009) · Zbl 1209.90265 · doi:10.1137/090756855
[6] Candès, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98, 925-936 (2009) · Zbl 1373.62244
[7] Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053-2080 (2009) · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[8] Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21-30 (2008) · doi:10.1109/MSP.2007.914731
[9] Candès, E. J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1-11:37 (2011) · Zbl 1327.62369
[10] Candès, E.J., Sing-Long, C.A., Trzasko, J.D.: Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61(19), 4643-4657 (2013) · Zbl 1393.94187 · doi:10.1109/TSP.2013.2270464
[11] Casella, B., Roberts, G., Stramer, O.: Stability of partially implicit Langevin schemes and their MCMC variants. Methodol. Comput. Appl. Probab. 13(4), 835-854 (2011) · Zbl 1248.60080 · doi:10.1007/s11009-010-9196-5
[12] Chaari, L., Batatia, H., Chaux, C. & Tourneret, J.-Y.: Sparse signal and image recovery using a proximal Bayesian algorithm. ArXiv e-prints (2014)
[13] Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1-2), 89-97 (2004) · Zbl 1366.94048
[14] Chandrasekaran, V., Jordan, M.I.: Computational and statistical tradeoffs via convex relaxation. Proc. Natl Acad. Sci. U.S.A. 110(13), 1181-1190 (2013) · Zbl 1292.62019 · doi:10.1073/pnas.1302293110
[15] Chandrasekaran, V., Sanghavi, S., Parrilo, P., Willsky, A.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572-596 (2011) · Zbl 1226.90067 · doi:10.1137/090761793
[16] Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935-1967 (2012) · Zbl 1257.62061 · doi:10.1214/11-AOS949
[17] Combettes, P., Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168-1200 (2005) · Zbl 1179.94031 · doi:10.1137/050626090
[18] Combettes, PL; Pesquet, J-C; Bauschke, HH (ed.); Burachik, RS (ed.); Combettes, PL (ed.); Elser, V. (ed.); Luke, DR (ed.); Wolkowicz, H. (ed.), Proximal splitting methods in signal processing, 181-212 (2011), New York · Zbl 1242.90160
[19] Fazel, M.: Matrix rank minimization with applications. PhD thesis, Department of Electrical Engineering, Stanford University (2002)
[20] Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, 3rd edn. Chapman and Hall/CRC, London (2013) · Zbl 1279.62004
[21] Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7(4), 473-483 (1992) · Zbl 0085.18501 · doi:10.1214/ss/1177011137
[22] Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B 73(2), 123-214 (2011) · Zbl 1411.62071 · doi:10.1111/j.1467-9868.2010.00765.x
[23] Hansen, P.C., Nagy, J.G., O’Leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, Philadelphia (2006) · Zbl 1112.68127 · doi:10.1137/1.9780898718874
[24] Higham, D.J., Mao, X., Stuart, A.M.: Strong convergence of Euler-type methods for nonlinear stochastic differential equations. SIAM J. Numer. Anal. 40(3), 1041-1063 (2003) · Zbl 1026.65003 · doi:10.1137/S0036142901389530
[25] Komodakis, N., Pesquet, J.-C.: Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems. ArXiv e-prints (2014)
[26] Łatuszyński, K., Roberts, G.O., Thiéry, A., Wolny, K.: Discussion of Riemann manifold Langevin and Hamiltonian Monte Carlo methods by Mark Girolami and Ben Calderhead. J. R. Stat. Soc. Ser. B 73(2), 188-189 (2011)
[27] Marnissi, Y., Benazza-Benyahia, A., Chouzenoux, E., Pesquet, J.-C.: Majorize-Minimize adapted Metropolis-Hastings algorithm. Application to multichannel image recovery. In: 22th European Signal Processing Conference (EUSIPCO 2014), Lisbon, Portugal (2014) · Zbl 07590898
[28] Martinet, B.: Regularisation d’inéquations variationelles par approximations successives. Revue Fran. d’Automatique et Infomatique Rech. Opérationelle 4, 154-159 (1970) · Zbl 0215.21103
[29] Mattingly, J., Stuart, A., Higham, D.: Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stoch. Proc. Appl. 101(2), 185-232 (2002) · Zbl 1075.60072 · doi:10.1016/S0304-4149(02)00150-3
[30] Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer, London (1993) · Zbl 0925.60001 · doi:10.1007/978-1-4471-3267-7
[31] Moreau, J.-J.: Fonctions convexes duales et points proximaux dans un espace Hilbertien. C. R. Acad. Sci. Paris Sér. A Math. 255, 2897-2899 (1962) · Zbl 0118.10502
[32] Neal, R.: MCMC using Hamiltonian dynamics. ArXiv e-prints (2012) · Zbl 1229.65018
[33] Negahban, S., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: optimal bounds with noise. J. Mach. Learn. Res. 13, 1665-1697 (2012) · Zbl 1436.62204
[34] Oliveira, J., Bioucas-Dias, J., Figueiredo, M.: Adaptive total variation image deblurring: a majorization-minimization approach. Signal Process. 89(9), 1683-1693 (2009) · Zbl 1178.94029 · doi:10.1016/j.sigpro.2009.03.018
[35] Ottobre, M., Stuart, A.M.: Diffusion limit for the random walk Metropolis algorithm out of stationarity. ArXiv e-prints (2014) · Zbl 1467.60056
[36] Papadopoulo, T., Lourakis, M. I. A.: Estimating the Jacobian of the singular value decomposition: theory and applications. In: Proceedings of the 6th European Conference on Computer Vision-Part I (ECCV ’00), pp. 554-570 (2000) · Zbl 1209.90265
[37] Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123-231 (2014)
[38] Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681-686 (2008) · Zbl 1330.62292 · doi:10.1198/016214508000000337
[39] Pereyra, M.: Proximal Markov chain Monte Carlo algorithms. ArXiv e-prints (2013) · Zbl 1505.62315
[40] Pesquet, J.-C., Pustelnik, N.: A parallel inertial proximal optimization method. Pac. J. Optim. 8(2), 273-305 (2012) · Zbl 1259.47080
[41] Pillai, N.S., Stuart, A.M., Thiry, A.H.: Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22(6), 2320-2356 (2012) · Zbl 1272.60053 · doi:10.1214/11-AAP828
[42] Rahul, M., Trevor, H., Robert, T.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11, 2287-2322 (2010) · Zbl 1242.68237
[43] Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471-501 (2010) · Zbl 1198.90321 · doi:10.1137/070697835
[44] Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004) · Zbl 1096.62003 · doi:10.1007/978-1-4757-4145-2
[45] Roberts, G., Stramer, O.: Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337-357 (2002) · Zbl 1033.65003 · doi:10.1023/A:1023562417138
[46] Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernulli 2(4), 341-363 (1996) · Zbl 0870.60027 · doi:10.2307/3318418
[47] Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877-898 (1976) · Zbl 0358.90053 · doi:10.1137/0314056
[48] Schreck, A., Fort, G., Le Corff, S., Moulines, E.: A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. ArXiv e-prints (2013)
[49] Stramer, O., Tweedie, R.L.: Langevin-type models I: diffusions with given stationary distributions and their discretizations. Methodol. Comput. Appl. Probab. 1, 283-306 (1999a) · Zbl 0947.60071 · doi:10.1023/A:1010086427957
[50] Stramer, O., Tweedie, R.L.: Langevin-type models II: self-targeting candidates for MCMC algorithms. Methodol. Comput. Appl. Probab. 1, 307-328 (1999b) · Zbl 0946.60063 · doi:10.1023/A:1010090512027
[51] Yuan, Q., Minka, T. P.: Hessian-based Markov chain Monte Carlo Algorithms. Unpublished manuscript (2002) · Zbl 0946.60063
[52] Zhang, Y., Sutton, C.: Quasi-Newton Markov chain Monte Carlo. In: Advances in Neural Information Processing Systems (NIPS) (2011) · Zbl 0215.21103
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.