zbMATH — the first resource for mathematics

Differentiable game mechanics. (English) Zbl 07064064
Summary: Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood – and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in \(n\)-player differentiable games.
The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – while at the same time being applicable to, and having guarantees in, much more general cases.
68T05 Learning and adaptive systems in artificial intelligence
CycleGAN; pix2pix
Full Text: Link
[1] S Abdallah and V R Lesser. A multiagent reinforcement learning algorithm with non-linear dynamics.J. Artif. Intell. Res., 33:521-549, 2008. · Zbl 1165.91328
[2] V Arnold.Mathematical Methods of Classical Mechanics. Springer, 1989.
[3] J Bailey and G Piliouras. Multiagent learning in network zero-sum games is a Hamiltonian system. InAAMAS, 2019.
[4] D Balduzzi. Strongly-typed agents are guaranteed to interact safely. InICML, 2017.
[5] D Balduzzi, S Racani‘ere, J Martens, J Foerster, K Tuyls, and T Graepel. The mechanics of n-player differentiable games. InICML, 2018a.
[6] D Balduzzi, K Tuyls, J Perolat, and T Graepel. Re-evaluating evaluation. InNeurIPS, 2018b.
[7] R Bott and L Tu.Differential Forms in Algebraic Topology. Springer, 1995. · Zbl 0496.55001
[8] F Bottacin. A Marsden-Weinstein reduction theorem for presymplectic manifolds. 2005.
[9] M Bowling and M Veloso. Multiagent learning using a variable learning rate.Artificial Intelligence, 136:215-250, 2002. · Zbl 0995.68075
[10] M Bowling. Convergence and no-regret in multiagent learning. InNeurIPS, pages 209-216, 2004.
[11] Y Burda, H Edwards, D Pathak, A Storkey, T Darrell, and A Efros. Large-scale study of curiosity-driven learning. InICLR, 2019.
[12] O Candogan, I Menache, A Ozdaglar, and P A Parrilo. Flows and decompositions of games: harmonic and potential games.Mathematics of Operations Research, 36(3):474-503, 2011. · Zbl 1239.91006
[13] C Daskalakis, P W Goldberg, and C Papadimitriou. The complexity of computing a Nash equilibrium.SIAM J. Computing, 39(1):195-259, 2009. · Zbl 1185.91019
[14] C Daskalakis, A Ilyas, V Syrgkanis, and H Zeng. Training GANs with optimism. InICLR, 2018.
[15] F Facchinei and C Kanzow. Generalized Nash equilibrium problems.Annals of Operations Research, 175(1):177-211, 2010. · Zbl 1185.91016
[16] S Feizi, C Suh, F Xia, and D Tse.Understanding GANs:the LQG setting.In arXiv:1710.10793, 2017.
[17] J N Foerster, R Y Chen, M Al-Shedivat, S Whiteson, P Abbeel, and I Mordatch. Learning with opponent-learning awareness. InAAMAS, 2018.
[18] I Gemp and S Mahadevan. Online monotone optimization. InarXiv:1608.07888, 2016.
[19] I Gemp and S Mahadevan. Online monotone games. InarXiv:1710.07328, 2017.
[20] I Gemp and S Mahadevan. Global convergence to the equilibrium of GANs using variational inequalities. InarXiv:1808.01531, 2018.
[21] G Gidel, R A Hemmat, M Pezeshki, R Lepriol, G Huang, S Lacoste-Julien, and I Mitliagkas. Negative momentum for improved game dynamics. InarXiv:1807.04740, 2018.
[22] G Gidel, H Berard, G Vignoud, P Vincent, and Lacoste-Julien. A variational inequality perspective on generative adversarial networks. InICLR, 2019.
[23] I J Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, A Courville, and Y Bengio. Generative adversarial nets. InNeurIPS, 2014.
[24] V Guillemin and S Sternberg.Symplectic Techniques in Physics. Cambridge University Press, 1990. · Zbl 0734.58005
[25] S Hart and A Mas-Colell.Simple Adaptive Strategies: From Regret-Matching to Uncoupled Dynamics. World Scientific, 2013. · Zbl 1298.91019
[26] M Heusel, H Ramsauer, T Unterthiner, B Nessler, G Klambauer, and S Hochreiter. GANs trained by a two time-scale update rule converge to a Nash equilibrium. InNeurIPS, 2017.
[27] M Jaderberg, W M Czarnecki, S Osindero, O Vinyals, A Graves, and K Kavukcuoglu. Decoupled neural interfaces using synthetic gradients. InICML, 2017.
[28] X Jiang, L Lim, Y Yao, and Y Ye. Statistical ranking and combinatorial Hodge theory. Math. Program., Ser. B, 127:203-244, 2011. · Zbl 1210.90142
[29] J D Lee, M Simchowitz, M I Jordan, and B Recht. Gradient descent converges to minimizers. InCOLT, 2016.
[30] JD Lee, I Panageas, G Piliouras, M Simchowitz, MI Jordan, and B Recht. First-order methods almost always avoid saddle points. InarXiv:1710.07406, 2017. · Zbl 1415.90089
[31] A Letcher, J Foerster, D Balduzzi, T Rockt¨aschel, and S Whiteson. Stable opponent shaping in differentiable games. InICLR, 2019.
[32] B Liu, J Liu, M Ghavamzadeh, S Mahadevan, and M Petrik. Proximal gradient temporal difference learning algorithms. InIJCAI, 2016. · Zbl 06987098
[33] X Lu. Hamiltonian games.Journal of Combinatorial Theory, Series B, 55:18-32, 1992. · Zbl 0702.90108
[34] P Mertikopoulos and Z Zhou. Learning in games with continuous action sets and unknown payoff functions. InarXiv:1608.07310, 2016. · Zbl 1420.91027
[35] P Mertikopoulos, C Papadimitriou, and G Piliouras. Cycles in adversarial regularized learning. InSODA, 2018. · Zbl 1403.68200
[36] P Mertikopoulos, H Zenati, B Lecouat, C Foo, V Chandrasekhar, and G Piliouras. Mirror descent in saddle-point problems: Going the extra (gradient) mile. InICLR, 2019.
[37] L Mescheder, S Nowozin, and A Geiger. The numerics of GANs. InNeurIPS. 2017.
[38] L Mescheder, A Geiger, and S Nowozin. Which training methods for GANs do actually converge? InICML, 2018.
[39] L Metz, B Poole, D Pfau, and J Sohl-Dickstein. Unrolled generative adversarial networks. InICLR, 2017.
[40] D Monderer and L S Shapley. Potential games.Games and Economic Behavior, 14:124-143, 1996. · Zbl 0862.90137
[41] V Nagarajan and J Z Kolter. Gradient descent GAN optimization is locally stable. In NeurIPS, 2017.
[42] J Nash. Equilibrium points inn-person games.PNAS, 36(1):48-49, 1950. · Zbl 0036.01104
[43] Y Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, 2004. · Zbl 1086.90045
[44] J. Ortega and W. Rheinboldt.Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial and Applied Mathematics, 2000. · Zbl 0949.65053
[45] C Papadimitriou and G Piliouras. From Nash equilibria to chain recurrent sets: solution concepts and topology. InITCS, 2016. · Zbl 1335.91009
[46] C Papadimitriou and G Piliouras. From Nash equilibria to chain recurrent sets: an algorithmic solution concept for game theory.Entropy, 20, 2018.
[47] D Pathak, P Agrawal, A A Efros, and T Darrell. Curiosity-driven exploration by selfsupervised prediction. InICML, 2017.
[48] David Pfau and Oriol Vinyals. Connecting generative adversarial networks and actor-critic methods. InarXiv:1610.01945, 2016.
[49] S Racani‘ere, T Weber, D P Reichert, L Buesing, A Guez, D J Rezende, A P Badia, O Vinyals, N Heess, Y Li, R Pascanu, P Battaglia, D Hassabis, D Silver, and D Wierstra. Imagination-augmented agents for deep reinforcement learning. InNeurIPS, 2017.
[50] S Rakhlin and K Sridharan. Optimization, learning, and games with predictable sequences. InNeurIPS, 2013.
[51] J B Rosen. Existence and uniqueness of equilibrium points for concaveN-person games. Econometrica, 33(3):520-534, 1965. · Zbl 0142.17603
[52] R W Rosenthal. A class of games possessing pure-strategy Nash equilibria.Int J Game Theory, 2:65-67, 1973. · Zbl 0259.90059
[53] T Salimans, I Goodfellow, W Zaremba, V Cheung, A Radford, and X Chen. Improved techniques for training GANs. InNeurIPS, 2016.
[54] S Santurkar, L Schmidt, and A Madry. A classification-based study of covariate shift in GAN distributions. InICML, 2018.
[55] G Scutari, D P Palomar, F Facchinei, and J Pang. Convex optimization, game theory, and variational inequality theory.IEEE Signal Processing Magazine, pages 35-49, 2010.
[56] Y Shoham and K Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008. · Zbl 1163.91006
[57] M Shub.Global Stability of Dynamical Systems. Springer, 2000.
[58] S Singh, M Kearns, and Y Mansour. Nash convergence of gradient dynamics in general-sum games. InUAI, 2000.
[59] G Stoltz and G Lugosi. Learning correlated equilibria in games with compact sets of strategies.Games and Economic Behavior, 59:187-208, 2007. · Zbl 1271.91012
[60] V Syrgkanis, A Agarwal, H Luo, and R E Schapire. Fast convergence of regularized learning in games. InNeurIPS, 2015.
[61] A Vezhnevets, S Osindero, T Schaul, N Heess, M Jaderberg, D Silver, and K Kavukcuoglu. FeUdal networks for hierarchical reinforcement learning. InICML, 2017.
[62] G Wayne and L F Abbott. Hierarchical control using networks trained with higher-level forward models.Neural Computation, (26), 2014.
[63] J Zhu, T Park, P Isola, and A A Efros. Unpaired image-to-image translation using cycleconsistent adversarial networks. InCVPR, 2017.
[64] M Zinkevich.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.