×

A unified analysis of first-order methods for smooth games via integral quadratic constraints. (English) Zbl 07370620

Summary: The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smooth and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method (GD), derive sharper bounds for the proximal point method (PPM) and optimistic gradient method (OG), and provide for the first time a global convergence rate for the negative momentum method (NM) with an iteration complexity \(\mathcal{O}(\kappa^{1.5})\), which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch. Our code is made public at https://github.com/gd-zhang/IQC-Game.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Jacob Abernethy, Kevin A Lai, and Andre Wibisono. Last-iterate convergence rates for min-max optimization.arXiv preprint arXiv:1906.02027, 2019.
[2] Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, and Thomas Hofmann. Local saddle point optimization: A curvature exploitation approach. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 486-495, 2019.
[3] Kwangjun Ahn. From proximal point method to nesterov’s acceleration.arXiv preprint arXiv:2005.08304, 2020.
[4] Martin Arjovsky, Soumith Chintala, and L´eon Bottou. Wasserstein generative adversarial networks. InInternational Conference on Machine Learning, pages 214-223, 2017.
[5] Wa¨ıss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. A tight and unified analysis of gradient-based methods for a whole spectrum of differentiable games. InInternational Conference on Artificial Intelligence and Statistics, pages 2863-2873, 2020a.
[6] Wa¨ıss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, and Gauthier Gidel. Accelerating smooth games by manipulating spectral shapes. InThe 23rd International Conference on Artificial Intelligence and Statistics, pages 1705-1715, 2020b.
[7] Francis Bach and Eric Moulines. Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n). InAdvances in neural information processing systems, pages 773-781, 2013.
[8] Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004. · Zbl 1058.90049
[9] George HG Chen and R Tyrrell Rockafellar. Convergence rates in forward-backward splitting.SIAM Journal on Optimization, 7(2):421-444, 1997. · Zbl 0876.49009
[10] Oswaldo Luiz Valle Costa, Marcelo Dutra Fragoso, and Ricardo Paulino Marques.Discretetime Markov jump linear systems. Springer Science & Business Media, 2006.
[11] Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, and Le Song. Sbeed: Convergent reinforcement learning with nonlinear function approximation. In International Conference on Machine Learning, pages 1125-1134, 2018.
[12] Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training gans with optimism. InInternational Conference on Learning Representations, 2018.
[13] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization.Journal of Machine Learning Research, 17(83):1-5, 2016. · Zbl 1360.90008
[14] Dmitriy Drusvyatskiy.The proximal point method revisited.arXiv preprint arXiv:1712.06038, 2017.
[15] Simon S Du, Jianshu Chen, Lihong Li, Lin Xiao, and Dengyong Zhou. Stochastic variance reduction methods for policy evaluation. InInternational Conference on Machine Learning, pages 1049-1058, 2017.
[16] Alireza Fallah, Asuman Ozdaglar, and Sarath Pattathil. An optimal multistage stochastic gradient method for minimax problems.arXiv preprint arXiv:2002.05683, 2020.
[17] Farzan Farnia and Asuman Ozdaglar. Do GANs always have Nash equilibria? InProceedings of the 37th International Conference on Machine Learning, pages 3029-3039, 2020.
[18] Tanner Fiez, Benjamin Chasnov, and Lillian J Ratliff. Convergence of learning dynamics in stackelberg games.arXiv preprint arXiv:1906.01217, 2019.
[19] Pascal Gahinet and Pierre Apkarian. A linear matrix inequality approach to H∞control. International journal of robust and nonlinear control, 4(4):421-448, 1994. · Zbl 0808.93024
[20] Euhanna Ghadimi, Hamid Reza Feyzmahdavian, and Mikael Johansson. Global convergence of the heavy-ball method for convex optimization. In2015 European control conference (ECC), pages 310-315. IEEE, 2015.
[21] Gauthier Gidel, Hugo Berard, Ga¨etan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks. InInternational Conference on Learning Representations, 2018.
[22] Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, R´emi Le Priol, Gabriel Huang, Simon Lacoste-Julien, and Ioannis Mitliagkas. Negative momentum for improved game dynamics. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1802-1811, 2019.
[23] Noah Golowich, Sarath Pattathil, and Constantinos Daskalakis. Tight last-iterate convergence rates for no-regret learning in multi-player games.Advances in Neural Information Processing Systems, 33, 2020a.
[24] Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, and Asuman Ozdaglar. Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems. InProceedings of Thirty Third Conference on Learning Theory, pages 1758-1784, 2020b.
[25] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in neural information processing systems, pages 2672-2680, 2014.
[26] Daniel R Grayson and Michael E Stillman. Macaulay2, a software system for research in algebraic geometry, 2002.
[27] Patrick T Harker and Jong-Shi Pang. Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications.Mathematical programming, 48(1-3):161-220, 1990. · Zbl 0734.90098
[28] Emilie V Haynsworth.On the schur complement.Technical report, BASEL UNIV (SWITZERLAND) MATHEMATICS INST, 1968.
[29] Elad Hazan. Introduction to online convex optimization.Foundations and Trends in Optimization, 2(3-4):157-325, 2016.
[30] Reyhane Askari Hemmat, Amartya Mitra, Guillaume Lajoie, and Ioannis Mitliagkas. Lead: Least-action dynamics for min-max optimization.arXiv preprint arXiv:2010.13846, 2020.
[31] Yu-Guan Hsieh, Franck Iutzeler, J´erˆome Malick, and Panayotis Mertikopoulos. On the convergence of single-call stochastic extra-gradient methods. InAdvances in Neural Information Processing Systems, pages 6938-6948, 2019.
[32] Bin Hu and Laurent Lessard. Control interpretations for first-order optimization methods. In2017 American Control Conference (ACC), pages 3114-3119. IEEE, 2017a.
[33] Bin Hu and Laurent Lessard. Dissipativity theory for nesterov’s accelerated method. In International Conference on Machine Learning, pages 1549-1557. PMLR, 2017b.
[34] Bin Hu, Peter Seiler, and Anders Rantzer. A unified analysis of stochastic optimization methods using jump system theory and quadratic constraints.Proceedings of Machine Learning Research vol, 65:1-33, 2017.
[35] Prateek Jain, Sham M Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Accelerating stochastic gradient descent for least squares regression. InConference On Learning Theory, pages 545-604, 2018.
[36] Chi Jin, Praneeth Netrapalli, and Michael Jordan. What is local optimality in nonconvexnonconcave minimax optimization? InInternational Conference on Machine Learning, pages 4880-4889. PMLR, 2020.
[37] Galina M Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:747-756, 1976. · Zbl 0342.90044
[38] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems, pages 1097-1105, 2012.
[39] Laurent Lessard and Peter Seiler. Direct synthesis of iterative algorithms with bounds on achievable worst-case convergence rate. In2020 American Control Conference (ACC), pages 119-125. IEEE, 2020.
[40] Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and design of optimization algorithms via integral quadratic constraints.SIAM Journal on Optimization, 26(1): 57-95, 2016. · Zbl 1329.90103
[41] Alistair Letcher, David Balduzzi, S´ebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. Differentiable game mechanics.Journal of Machine Learning Research, 20:1-40, 2019. · Zbl 1489.91032
[42] Chaoyue Liu and Mikhail Belkin.Mass:an accelerated stochastic method for overparametrized learning.arXiv preprint arXiv:1810.13395, 2018.
[43] Siyuan Ma, Raef Bassily, and Mikhail Belkin. The power of interpolation: Understanding the effectiveness of sgd in modern over-parametrized learning. InInternational Conference on Machine Learning, pages 3325-3334. PMLR, 2018.
[44] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations, 2018. URLhttps://openreview.net/forum? id=rJzIBfZAb.
[45] Oren Mangoubi and Nisheeth K Vishnoi.A second-order equilibrium in nonconvexnonconcave min-max optimization:Existence and algorithm.arXiv preprint arXiv:2006.12363, 2020.
[46] Eric V Mazumdar, Michael I Jordan, and S Shankar Sastry. On finding local nash equilibria (and only local nash equilibria) in zero-sum games.arXiv preprint arXiv:1901.00838, 2019.
[47] Alexandre Megretski and Anders Rantzer.System analysis via integral quadratic constraints.IEEE Transactions on Automatic Control, 42(6):819-830, 1997. · Zbl 0881.93062
[48] Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile. InInternational Conference on Learning Representations, 2018.
[49] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. The numerics of gans. InAdvances in Neural Information Processing Systems, pages 1825-1835, 2017.
[50] Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richt´arik, and Yura Malitsky. Revisiting stochastic extragradient. InInternational Conference on Artificial Intelligence and Statistics, pages 4573-4582, 2020.
[51] Aryan Mokhtari, Asuman Ozdaglar, and Sarath Pattathil. A unified analysis of extragradient and optimistic gradient methods for saddle point problems: Proximal point approach. InInternational Conference on Artificial Intelligence and Statistics, pages 1497-1507, 2020a.
[52] Aryan Mokhtari, Asuman E Ozdaglar, and Sarath Pattathil. Convergence rate of o(1/k) for optimistic gradient and extragradient methods in smooth convex-concave saddle point problems.SIAM Journal on Optimization, 30(4):3230-3251, 2020b. · Zbl 1454.90057
[53] Eric Moulines and Francis R Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. InAdvances in Neural Information Processing Systems, pages 451-459, 2011.
[54] Arkadi Nemirovski. Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229-251, 2004. · Zbl 1106.90059
[55] Yurii Nesterov. A method of solving a convex programming problem with convergence rate o(kˆ2). InDoklady Akademii Nauk, volume 269, pages 543-547. Russian Academy of Sciences, 1983.
[56] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems.Mathematical Programming, 109(2-3):319-344, 2007. · Zbl 1167.90014
[57] Neal Parikh and Stephen Boyd. Proximal algorithms.Foundations and Trends in optimization, 1(3):127-239, 2014.
[58] Leonid Denisovich Popov. A modification of the arrow-hurwicz method for search of saddle points.Mathematical notes of the Academy of Sciences of the USSR, 28(5):845-848, 1980. · Zbl 0467.90081
[59] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv preprint arXiv:1511.06434, 2015.
[60] Sasha Rakhlin and Karthik Sridharan. Optimization, learning, and games with predictable sequences. InAdvances in Neural Information Processing Systems, pages 3066-3074, 2013.
[61] R Tyrrell Rockafellar. Monotone operators and the proximal point algorithm.SIAM journal on control and optimization, 14(5):877-898, 1976. · Zbl 0358.90053
[62] Philipp Rostalski and Bernd Sturmfels. Dualities in convex algebraic geometry.arXiv preprint arXiv:1006.4894, 2010.
[63] Ernest K Ryu and Stephen Boyd. Primer on monotone operator methods.Appl. Comput. Math, 15(1):3-43, 2016. · Zbl 1342.47066
[64] Ernest K Ryu, Adrien B Taylor, Carolina Bergeling, and Pontus Giselsson. Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3):2251-2271, 2020. · Zbl 1511.90322
[65] Florian Sch¨afer and Anima Anandkumar. Competitive gradient descent. In33rd Conference on Neural Information Processing Systems, pages Art-No. Neural Information Processing Systems Foundation, Inc., 2019.
[66] Mark Schmidt and Nicolas Le Roux. Fast convergence of stochastic gradient descent under a strong growth condition.arXiv preprint arXiv:1308.6370, 2013.
[67] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676):354-359, 2017.
[68] Thomas Strohmer and Roman Vershynin. A randomized kaczmarz algorithm with exponential convergence.Journal of Fourier Analysis and Applications, 15(2):262, 2009. · Zbl 1169.68052
[69] Adrien B Taylor, Julien M Hendrickx, and Fran¸cois Glineur.Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, 161(1-2):307-345, 2017. · Zbl 1359.90098
[70] Paul Tseng. On linear convergence of iterative methods for the variational inequality problem.Journal of Computational and Applied Mathematics, 60(1-2):237-252, 1995. · Zbl 0835.65087
[71] Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal on Optimization, 2(3), 2008.
[72] Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1195-1204, 2019.
[73] J Von Neumann and O Morgenstern. Theory of games and economic behavior. 1944. · Zbl 0063.05930
[74] Yuanhao Wang, Guodong Zhang, and Jimmy Ba. On solving minimax optimization locally: A follow-the-ridge approach. InInternational Conference on Learning Representations, 2019.
[75] Guodong Zhang and Yuanhao Wang. On the suboptimality of negative momentum for minimax optimization. InInternational Conference on Artificial Intelligence and Statistics, pages 2098-2106. PMLR, 2021.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.