×

Semidual regularized optimal transport. (English) Zbl 1402.49037

Summary: Variational problems that involve Wasserstein distances and more generally Optimal Transport (OT) theory are playing an increasingly important role in data sciences. Such problems can be used to form an exemplar measure out of various probability measures, as in the Wasserstein barycenter problem, or to carry out parametric inference and density fitting, where the loss is measured in terms of an optimal transport cost to the measure of observations. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. Entropic regularization has recently emerged as an efficient tool to approximate the solution of such variational Wasserstein problems. In this paper, we give a thorough duality tour of these regularization techniques. In particular, we show how important concepts from classical OT such as \(c\)-transforms and semidiscrete approaches translate into similar ideas in a regularized setting. These dual formulations lead to smooth variational problems, which can be solved using smooth, differentiable, and convex optimization problems that are simpler to implement and numerically more stable than their unregularized counterparts. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spatial regularization functionals.

MSC:

49Q20 Variational problems in a geometric measure-theoretic setting
93E20 Optimal stochastic control
49N60 Regularity of solutions in optimal control
49J20 Existence theories for optimal control problems involving partial differential equations
65D99 Numerical approximation and computational geometry (primarily algorithms)
68U10 Computing methodologies for image processing
90C25 Convex programming

Software:

EMD
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] I. Abraham, R. Abraham, M. Bergounioux, and G. Carlier, Tomographic reconstruction from a few views: A multi-marginal optimal transport approach, Appl. Math. Optim., 75 (2017), pp. 55–73. · Zbl 1456.49035
[2] M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM J. Math. Anal., 43 (2011), pp. 904–924, . · Zbl 1223.49045
[3] Z. Allen-Zhu, Y. Li, R. Oliveira, and A. Wigderson, Much Faster Algorithms for Matrix Scaling, arXiv preprint, , 2017.
[4] L. Ambrosio, N. Gigli, and G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Springer, New York, 2006.
[5] F. Aurenhammer, Power diagrams: Properties, algorithms and applications, SIAM J. Comput., 16 (1987), pp. 78–96, . · Zbl 0616.52007
[6] F. Aurenhammer, F. Hoffmann, and B. Aronov, Minkowski-type theorems and least-squares clustering, Algorithmica, 20 (1998), pp. 61–76. · Zbl 0895.68135
[7] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), pp. 1705–1749. · Zbl 1190.62117
[8] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer-Verlag, New York, 2011. · Zbl 1218.47001
[9] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183–202, . · Zbl 1175.94009
[10] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré, Iterative Bregman projections for regularized transportation problems, SIAM J. Sci. Comput., 37 (2015), pp. A1111–A1138, . · Zbl 1319.49073
[11] J. Bigot and T. Klein, Consistent Estimation of a Population Barycenter in the Wasserstein Space, preprint, , 2012.
[12] N. Bonneel, G. Peyré, and M. Cuturi, Wasserstein barycentric coordinates: Histogram regression using optimal transport, ACM Trans. Graphics, 35 (2016), art. 71.
[13] N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, Sliced and radon Wasserstein barycenters of measures, J. Math. Imaging Vision, 51 (2015), pp. 22–45. · Zbl 1332.94014
[14] N. Bonneel, M. van de Panne, S. Paris, and W. Heidrich, Displacement interpolation using Lagrangian mass transport, ACM Trans. Graphics (SIGGRAPH ASIA’11), 30 (2011), art. 158.
[15] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004. · Zbl 1058.90049
[16] M. Burger, M. Franeka, and C.-B. Schonlieb, Regularised regression and density estimation based on optimal transport, Appl. Math. Res. Express, 2 (2012), pp. 209–253.
[17] G. Carlier, A. Oberman, and E. Oudet, Numerical methods for matching for teams and Wasserstein barycenters, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1621–1642. · Zbl 1331.49042
[18] G. Carlier and C. Poon, On the total variation Wasserstein gradient flow and the TV-JKO scheme, ESAIM Control Optim. Calc. Var., to appear, .
[19] V. Caselles, A. Chambolle, and M. Novaga, The discontinuity set of solutions of the TV denoising problem and some extensions, Multiscale Model. Simul., 6 (2007), pp. 879–894, . · Zbl 1145.49024
[20] E. Cazelles, J. Bigot, and N. Papadakis, Regularized barycenters in the Wasserstein space, in Geometric Science of Information, F. Nielsen and F. Barbaresco, eds., Springer International, Cham, 2017, pp. 83–90. · Zbl 1425.62074
[21] T. M. Chan, Optimal output-sensitive convex hull algorithms in two and three dimensions, Discrete Comput. Geom., 16 (1996), pp. 361–368. · Zbl 0857.68111
[22] S. Claici, E. Chien, and J. Solomon, Stochastic Wasserstein barycenters, in ICML’18, 2018.
[23] M. B. Cohen, A. Madry, D. Tsipras, and A. Vladu, Matrix scaling and balancing via box constrained Newton’s method and interior point methods, in Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE, 2017, pp. 902–913.
[24] R. Cominetti and J. San Martín, Asymptotic analysis of the exponential penalty trajectory in linear programming, Math. Programming, 67 (1994), pp. 169–187. · Zbl 0833.90081
[25] N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, Optimal transport for domain adaptation, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), pp. 1853–1865, .
[26] M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in Advances in Neural Information Processing Systems 26, 2013, pp. 2292–2300.
[27] M. Cuturi and A. Doucet, Fast computation of Wasserstein barycenters, in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 685–693.
[28] M. Cuturi and G. Peyré, A smoothed dual approach for variational Wasserstein problems, SIAM J. Imaging Sci., 9 (2016), pp. 320–343, . · Zbl 1335.49076
[29] F. De Goes, K. Breeden, V. Ostromoukhov, and M. Desbrun, Blue noise through optimal transport, ACM Trans. Graphics, 31 (2012), art. 171.
[30] J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Algebra Appl., 114 (1989), pp. 717–735. · Zbl 0674.15001
[31] A. Genevay, M. Cuturi, G. Peyré, and F. Bach, Stochastic optimization for large-scale optimal transport, in Advances in Neural Information Processing Systems, 2016, pp. 3440–3448.
[32] U. Gianazza, G. Savaré, and G. Toscani, The Wasserstein gradient flow of the Fisher information and the quantum drift-diffusion equation, Arch. Rational Mech. Anal., 194 (2009), pp. 133–220, . · Zbl 1223.35264
[33] R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker–Planck equation, SIAM J. Math. Anal., 29 (1998), pp. 1–17, . · Zbl 0915.35120
[34] P. A. Knight and D. Ruiz, A fast algorithm for matrix balancing, IMA J. Numer. Anal., 33 (2013), pp. 1029–1047. · Zbl 1276.65025
[35] D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401 (1999), pp. 788–791. · Zbl 1369.68285
[36] J. Lellmann, D. A. Lorenz, C. Schönlieb, and T. Valkonen, Imaging with Kantorovich–Rubinstein discrepancy, SIAM J. Imaging Sci., 7 (2014), pp. 2833–2859, . · Zbl 1308.49043
[37] C. Léonard, A survey of the Schrödinger problem and some of its connections with optimal transport, Discrete Contin. Dyn. Syst., 34 (2014), pp. 1533–1574. · Zbl 1277.49052
[38] B. Lévy, A numerical algorithm for \(l^2\) semi-discrete optimal transport in 3D, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1693–1715. · Zbl 1331.49037
[39] B. Lévy and E. Schwindt, Notions of optimal transport theory and how to implement them on a computer, Comput. & Graphics, 72 (2018), pp. 135–148.
[40] J. Maas, M. Rumpf, C. Schönlieb, and S. Simon, A generalized model for optimal transport of images including dissipation and density modulation, ESAIM Math. Model. Numer. Anal., 49 (2015), pp. 1745–1769. · Zbl 1348.94009
[41] Q. Mérigot, A multiscale approach to optimal transport, Comput. Graph. Forum, 30 (2011), pp. 1583–1592. · Zbl 1431.76040
[42] S. Minsker, S. Srivastava, L. Lin, and D. B. Dunson, Robust and scalable Bayes via a median of subset posterior measures, J. Mach. Learn. Res., 18 (2017), pp. 1–40, . · Zbl 1442.62056
[43] F. Nielsen, Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight approximation for frequency histograms, IEEE Signal Process. Lett., 20 (2013), pp. 657–660.
[44] V. Oliker and L. D. Prussner, On the numerical solution of the equation \(\frac{∂^2 z}{∂ x^2} \frac{∂^2 z}{∂ y^2}-( \frac{∂^2 z}{∂ x∂ y} )^2=f\) and its discretizations, I, Numer. Math., 54 (1989), pp. 271–293. · Zbl 0659.65116
[45] F. Otto, The geometry of dissipative evolution equations: The porous medium equation, Comm. Partial Differential Equations, 26 (2001), pp. 101–174. · Zbl 0984.35089
[46] O. Pele and M. Werman, Fast and robust earth mover’s distances, in ICCV’09, 2009.
[47] G. Peyré, Entropic approximation of Wasserstein gradient flows, SIAM J. Imaging Sci., 8 (2015), pp. 2323–2351, . · Zbl 1335.90068
[48] G. Peyré and M. Cuturi, Computational Optimal Transport, preprint, , 2018.
[49] J. Rabin and N. Papadakis, Convex color image segmentation with optimal transport distances, in Proc. SSVM’15, 2015. · Zbl 1376.94007
[50] Y. Rubner, C. Tomasi, and L. Guibas, The earth mover’s distance as a metric for image retrieval, Internat. J. Computer Vision, 40 (2000), pp. 99–121. · Zbl 1012.68705
[51] F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser, 2015.
[52] B. Schmitzer, Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems, preprint, , 2016.
[53] B. Schmitzer and C. Schnorr, Object segmentation by shape matching with Wasserstein modes, in Proc. EMMCVPR’13, Lecture Notes in Comput. Sci. 8081, Springer, Berlin, Heidelberg, 2013, pp. 123–136.
[54] R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Statist., 35 (1964), pp. 876–879. · Zbl 0134.25302
[55] J. Solomon, F. de Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas, Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains, ACM Trans. Graphics (Proc. SIGGRAPH 2015), 34 (2015), art. 66, . · Zbl 1334.68267
[56] J. Solomon, R. Rustamov, L. Guibas, and A. Butscher, Wasserstein propagation for semi-supervised learning, in Proc. ICML 2014, 2014.
[57] M. Sugiyama, H. Nakahara, and K. Tsuda, Tensor balancing on statistical manifold, in Proceedings of ICML, 2017, pp. 3270–3279.
[58] P. Swoboda and C. Schnörr, Convex variational image restoration with histogram priors, SIAM J. Imaging Sci., 6 (2013), pp. 1719–1735, . · Zbl 1281.65041
[59] C. Villani, Optimal Transport: Old and New, Grundlehren Math. Wiss. 338, Springer-Verlag, 2009. · Zbl 1156.53003
[60] G.-S. Xia, S. Ferradans, G. Peyré, and J.-F. Aujol, Synthesizing and mixing stationary Gaussian texture models, SIAM J. Imaging Sci., 7 (2014), pp. 476–508, . · Zbl 1391.94108
[61] G. Zen, E. Ricci, and N. Sebe, Simultaneous ground metric learning and matrix factorization with earth movers distance, in Proc. ICPR’14, 2014.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.