zbMATH — the first resource for mathematics

Revisiting log-linear learning: asynchrony, completeness and payoff-based implementation. (English) Zbl 1239.91017
Summary: Log-linear learning is a learning algorithm that provides guarantees on the percentage of time that the action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning focuses on explicitly computing the stationary distribution and hence requires a highly structured environment. Since the appeal of log-linear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are stochastically stable. In this paper, we introduce slight variants of log-linear learning that provide the desired asymptotic guarantees while relaxing the structural assumptions to include synchronous updates, time-varying action sets, and limitations in information available to the players. The motivation for these relaxations stems from the applicability of log-linear learning to the control of multi-agent systems where these structural assumptions are unrealistic from an implementation perspective.

91A26 Rationality and learning in game theory
Full Text: DOI
[1] Alos-Ferrer, C.; Netzer, N., The logit-response dynamics, Games econ. behav., 68, 413-427, (2010) · Zbl 1207.91017
[2] Arslan, G.; Marden, J.R.; Shamma, J.S., Autonomous vehicle-target assignment: A game theoretical formulation, J. dynam. syst. meas. control, 129, 584-596, (2007)
[3] Asadpour, A., Saberi, A., 2009. On the inefficiency ratio of stable equilibria in congestion games. In: Proceedings of the 5th International Workshop on Internet and Network Economics, pp. 545-552.
[4] Babichenko, Y., 2010. Completely uncoupled dynamics and Nash equilibria. Working paper. · Zbl 1211.91059
[5] Beggs, A., Waiting times and equilibrium selection, Econ. theory, 25, 599-628, (2005) · Zbl 1127.91009
[6] Benaim, M., Sandholm, W.H., 2007. Logit evolution in potential games: Reversibility, rates of convergence, large deviations, and equilibrium selection. Working paper.
[7] Blum, A., Even-Dar, E., Ligett, K., 2006. Routing without regret: On convergence to Nash equilibria of regret-minimizing algorithms in routing games. In: Symposium on Principles of Distributed Computing (PODC), pp. 45-52. · Zbl 1314.91050
[8] Blume, L., The statistical mechanics of strategic interaction, Games econ. behav., 5, 387-424, (1993) · Zbl 0797.90123
[9] Blume, L., Population games, (), 425-460
[10] Blume, L., How noise matters, Games econ. behav., 44, 251-271, (2003) · Zbl 1056.91011
[11] Foster, D.P.; Young, H.P., Regret testing: learning to play Nash equilibrium without knowing you have an opponent, Theoretical econ., 1, 341-367, (2006)
[12] Germano, F.; Lugosi, G., Global Nash convergence of foster and youngʼs regret testing, Games econ. behav., 60, 135-154, (2007) · Zbl 1155.91318
[13] Hart, S.; Mas-Colell, A., A reinforcement learning procedure leading to correlated equilibrium, (), 181-200 · Zbl 1023.91004
[14] Mannor, S.; Shamma, J.S., Multi-agent learning for engineers, Artificial intelligence, 417-422, (2007), Special issue on Foundations of Multi-Agent Learning · Zbl 1168.68477
[15] Marden, J.R., Wierman, A., submitted for publication. Distributed welfare games.
[16] Marden, J.R., Arslan, G., Shamma, J.S., 2007a. Connections between cooperative control and potential games illustrated on the consensus problem. In: Proceedings of the 2007 European Control Conference (ECC ʼ07).
[17] Marden, J.R., Arslan, G., Shamma, J.S., 2007b. Regret based dynamics: Convergence in weakly acyclic games, in: Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Honolulu, Hawaii.
[18] Marden, J.R.; Arslan, G.; Shamma, J.S., Connections between cooperative control and potential games, IEEE trans syst. man cybernet. part B, 39, 1393-1407, (2009)
[19] Marden, J.R.; Arslan, G.; Shamma, J.S., Joint strategy fictitious play with inertia for potential games, IEEE trans. automat. control, 54, 208-220, (2009) · Zbl 1367.91035
[20] Marden, J.R.; Young, H.P.; Arslan, G.; Shamma, J.S., Payoff based dynamics for multi-player weakly acyclic games, SIAM J. control optim., 48, 373-396, (2009) · Zbl 1186.91033
[21] Monderer, D.; Shapley, L., Fictitious play property for games with identical interests, J. econ. theory, 68, 258-265, (1996) · Zbl 0849.90130
[22] Monderer, D.; Shapley, L.S., Potential games, Games econ. behav., 14, 124-143, (1996) · Zbl 0862.90137
[23] Montanari, A.; Saberi, A., The spread of innovations in social networks, Proc. natl. acad. sci. USA, 107, 20196-20201, (2010)
[24] Roughgarden, T., Selfish routing and the price of anarchy, (2005), MIT Press Cambridge, MA, USA
[25] Shah, D., Shin, J., 2010. Dynamics in congestion games. In: ACM SIGMETRICS.
[26] Shapley, L.S., Stochastic games, Proc. natl. acad. sci. USA, 39, 10, 1095-1100, (1953) · Zbl 0051.35805
[27] Shoham, Y.; Powers, R.; Grenager, T.; Vohra, R.; Wellman, M., If multi-agent learning is the answer, what is the question?, Foundations of multi-agent learning, Artificial intelligence, 171, 7, 365-377, (2007) · Zbl 1168.68493
[28] Vetta, A., 2002. Nash equilibria in competitive societies with applications to facility location, traffic routing, and auctions. In: Proc. of Symp. on Fdns. of Comp. Sci., pp. 416-425.
[29] Voorneveld, M., Best-response potential games, Econ. letters, 66, 289-295, (2000) · Zbl 0951.91008
[30] Wolpert, D.; Tumor, K., An overview of collective intelligence, ()
[31] Young, H.P., The evolution of conventions, Econometrica, 61, 1, 57-84, (1993) · Zbl 0773.90101
[32] Young, H.P., Individual strategy and social structure, (1998), Princeton University Press Princeton, NJ
[33] Young, H.P., Strategic learning and its limits, (2005), Oxford University Press
[34] Young, H.P., Learning by trial and error, Games econ. behav., 65, 626-643, (2009) · Zbl 1158.91327
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.