×

The Hanabi challenge: a new frontier for AI research. (English) Zbl 1476.68223

Summary: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.

MSC:

68T01 General topics in artificial intelligence
68T05 Learning and adaptive systems in artificial intelligence
68T42 Agent technology and artificial intelligence
91A12 Cooperative games
91A46 Combinatorial games
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Premack, D.; Woodruff, G., Does the chimpanzee have a theory of mind?, Behav. Brain Sci., 1, 515-526 (1978)
[2] Rabinowitz, N. C.; Perbet, F.; Song, H. F.; Zhang, C.; Eslami, S. M.A.; Botvinick, M., Machine theory of mind, (Proceedings of the 35th International Conference on Machine Learning (ICML) (2018)), 4215-4224
[3] Dennett, D. C., The Intentional Stance (1987), The MIT Press: The MIT Press Cambridge, MA
[4] Board Game Arena, Board game arena: play board games online! (2018)
[5] Nesta, J., Hanabi Live (2017)
[6] Foerster, J. N.; Assael, Y. M.; de Freitas, N.; Whiteson, S., Learning to communicate with deep multi-agent reinforcement learning, CoRR
[7] Lewis, M.; Yarats, D.; Dauphin, Y.; Parikh, D.; Batra, D., Deal or no deal? End-to-end learning of negotiation dialogues, (Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), Association for Computational Linguistics), 2433-2443
[8] Cao, K.; Lazaridou, A.; Lanctot, M.; Leibo, J. Z.; Tuyls, K.; Clark, S., Emergent communication through negotiation, CoRR
[9] Campbell, M.; Hoane, A. J.; Hsu, F.-h., Deep blue, Artif. Intell., 134, 1-2, 57-83 (2002) · Zbl 0982.68122
[10] Schaeffer, J.; Lake, R.; Lu, P.; Bryant, M., Chinook the world man-machine checkers champion, AI Mag., 17, 1, 21 (1996)
[11] Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; Hassabis, D., Mastering the game of Go with deep neural networks and tree search, Nature, 529, 7587, 484-489 (2016)
[12] Tesauro, G., Temporal difference learning and TD-Gammon, Commun. ACM, 38, 3, 58-68 (1995)
[13] Moravčík, M.; Schmid, M.; Burch, N.; Lisý, V.; Morrill, D.; Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; Bowling, M., Deepstack: Expert-level artificial intelligence in heads-up no-limit poker, Science, 356, 6337, 508-513 (2017) · Zbl 1403.68202
[14] Brown, N.; Sandholm, T., Superhuman AI for heads-up no-limit poker: Libratus beats top professionals, Science, 359, 6374, 418-424 (2018) · Zbl 1415.68163
[15] Lewis, D., Convention: A Philosophical Study (2008), John Wiley & Sons
[16] Crawford, V. P.; Sobel, J., Strategic information transmission, Econometrica, 50, 6, 1431-1451 (1982) · Zbl 0494.94007
[17] Burch, N.; Johanson, M.; Bowling, M., Solving imperfect information games using decomposition, (Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)), 602-608
[18] Burch, N., Time and Space: Why Imperfect Information Games Are Hard (2017), University of Alberta: University of Alberta Alberta, Canada, Ph.D. thesis
[19] Iraci, A., Conventions for Hanabi (2018)
[20] Nesta, J., Github - zamiell/hanabi-conventions: Hanabi conventions for the hyphen-ated group (2018)
[21] Bellemare, M. G.; Naddaf, Y.; Veness, J.; Bowling, M., The Arcade Learning Environment: an evaluation platform for general agents, J. Artif. Intell. Res., 47, 253-279 (2013)
[22] Machado, M. C.; Bellemare, M. G.; Talvitie, E.; Veness, J.; Hausknecht, M.; Bowling, M., Revisiting the Arcade Learning Environment: evaluation protocols and open problems for general agents, J. Artif. Intell. Res., 61, 523-562 (2018) · Zbl 1440.68231
[23] The Hanabi Learning Environment (2019)
[24] Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W., Openai gym, CoRR
[25] Gottwald, E. T.; Eger, M.; Martens, C., I see what you see: integrating eye tracking into hanabi playing agents, (Joint Proceedings of the AIIDE 2018 Workshops co-located with 14th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2018) (2018))
[26] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T. P.; Harley, T.; Silver, D.; Kavukcuoglu, K., Asynchronous methods for deep reinforcement learning, (Proceedings of the 33rd International Conference on Machine Learning (ICML) (2016)), 1928-1937
[27] B. Wymann, C. Dimitrakakis, A. Sumner, E. Espié, C. Guionneau, Torcs: The open racing car simulator, 2015.
[28] Beattie, C.; Leibo, J. Z.; Teplyashin, D.; Ward, T.; Wainwright, M.; Küttler, H.; Lefrancq, A.; Green, S.; Valdés, V.; Sadik, A.; Schrittwieser, J.; Anderson, K.; York, S.; Cant, M.; Cain, A.; Bolton, A.; Gaffney, S.; King, H.; Hassabis, D.; Legg, S.; Petersen, S., Deepmind lab, CoRR
[29] Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; Legg, S.; Kavukcuoglu, K., IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures, CoRR
[30] Jaderberg, M.; Czarnecki, W. M.; Dunning, I.; Marris, L.; Lever, G.; Castañeda, A. G.; Beattie, C.; Rabinowitz, N. C.; Morcos, A. S.; Ruderman, A.; Sonnerat, N.; Green, T.; Deason, L.; Leibo, J. Z.; Silver, D.; Hassabis, D.; Kavukcuoglu, K.; Graepel, T., Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, CoRR
[31] Jaderberg, M.; Czarnecki, W. M.; Dunning, I.; Marris, L.; Lever, G.; Castañeda, A. G.; Beattie, C.; Rabinowitz, N. C.; Morcos, A. S.; Ruderman, A.; Sonnerat, N.; Green, T.; Deason, L.; Leibo, J. Z.; Silver, D.; Hassabis, D.; Kavukcuoglu, K.; Graepel, T., Human-level performance in 3d multiplayer games with population-based reinforcement learning, Science, 364, 6443, 859-865 (2019)
[32] Jaderberg, M.; Dalibard, V.; Osindero, S.; Czarnecki, W. M.; Donahue, J.; Razavi, A.; Vinyals, O.; Green, T.; Dunning, I.; Simonyan, K.; Fernando, C.; Kavukcuoglu, K., Population based training of neural networks, CoRR
[33] Hessel, M.; Modayil, J.; van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M. G.; Silver Rainbow, D., Combining improvements in deep reinforcement learning, CoRR
[34] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G., Human-level control through deep reinforcement learning, Nature, 518, 7540, 529-533 (2015)
[35] P.S. Castro, S. Moitra, C. Gelada, S. Kumar, M.G. Bellemare, Dopamine: a research framework for deep reinforcement learning, arXiv.
[36] Bellemare, M. G.; Dabney, W.; Munos, R., A distributional perspective on reinforcement learning, (Proceedings of the International Conference on Machine Learning (2017))
[37] Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D., Prioritized experience replay, (International Conference on Learning Representations (2016))
[38] Foerster, J. N.; Song, F.; Hughes, E.; Burch, N.; Dunning, I.; Whiteson, S.; Botvinick, M.; Bowling, M., Bayesian action decoder for deep multi-agent reinforcement learning, CoRR
[39] accessed 19-September-2018 (2018).
[40] Cox, C.; Silva, J. D.; Deorsey, P.; Kenter, F. H.J.; Retter, T.; Tobin, J., How to make the perfect fireworks display: two strategies for Hanabi, Math. Mag., 88, 5, 323-336 (2015) · Zbl 1353.91014
[41] Wu, J., Github - wuthefwasthat/hanabi.rs: Hanabi simulation in rust (2018)
[42] Wu, D., Github - lightvector/fireflower: a rewrite of hanabi-bot in scala (2018)
[43] Osawa, H., Solving Hanabi: estimating hands by opponent’s actions in cooperative game with incomplete information, (Proceedings on the 2015 AAAI Workshop on Computer Poker and Imperfect Information (2015)), 37-43
[44] van den Bergh, M. J.H.; Hommelberg, A.; Kosters, W. A.; Spieksma, F. M., Aspects of the cooperative card game Hanabi, (BNAIC 2016: Artificial Intelligence. 28th Benelux Conference on Artificial Intelligence. BNAIC 2016: Artificial Intelligence. 28th Benelux Conference on Artificial Intelligence, CCIS, vol. 765 (2016), Springer: Springer Cham), 93-105
[45] Walton-Rivers, J.; Williams, P. R.; Bartle, R.; Perez-Liebana, D.; Lucas, S. M., Evaluating and modelling Hanabi-playing agents, (Proceedings of the IEEE Conference on Evolutionary Computation (2017) (2017))
[46] Bouzy, B., Playing Hanabi near-optimally, (Advances in Computer Games: ACG 2017. Advances in Computer Games: ACG 2017, LNCS, vol. 10664 (2017), Springer: Springer Cham), 51-62
[47] Baffier, J.-F.; Chiu, M.-K.; Diez, Y.; Korman, M.; Mitsou, V.; Renssen, A.; Roeloffzen, M.; Uno, Y., Hanabi is NP-hard, even for cheaters who look at their cards, Theor. Comput. Sci., 675, 43-55 (2017) · Zbl 1371.91026
[48] Bernstein, D. S.; Givan, R.; Immerman, N.; Zilberstein, S., The complexity of decentralized control of Markov decision processes, Math. Oper. Res., 27, 4, 819-840 (2002) · Zbl 1082.90593
[49] Liang, C.; Proft, J.; Andersen, E.; Knepper, R. A., Implicit communication of actionable information in human-ai teams, (Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019))
[50] Eger, M.; Martens, C., Practical specification of belief manipulation in games, (Proceedings, the Thirteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-17) (2017))
[51] Pacuit, E.; Roy, O., Epistemic foundations of game theory, (Zalta, E. N., The Stanford Encyclopedia of Philosophy, Summer 2017 Edition, Metaphysics Research Lab, Stanford University (2017))
[52] Walton-Rivers, J.; Williams CIG, P., Hanabi fireworks competition (2018), retrieved Sept. 27th, 2018
[53] Canaan, R.; Shen, H.; Torrado, R. R.; Togelius, J.; Nealen, A.; Menzel, S., Evolving agents for the Hanabi 2018 CIG competition, (IEEE Conference on Computational Intelligence and Games (CIG) (2018)), 1-8
[54] Goodman, J., Re-determinizing information set Monte Carlo tree search in hanabi, CoRR
[55] Cowling, P. I.; Powley, E. J.; Whitehouse, D., Information set Monte Carlo tree search, IEEE Trans. Comput. Intell. AI Games, 4, 120-143 (2012)
[56] Sutton, R.; Barto, A., Reinforcement Learning: An Introduction (2018), MIT Press · Zbl 1407.68009
[57] Littman, M. L., Markov games as a framework for multi-agent reinforcement learning, (Proceedings of the Eleventh International Conference on Machine Learning (1994), Morgan Kaufmann), 157-163
[58] Matignon, L.; Laurent, G. J.; Fort-Piat, N. L., Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems, Knowl. Eng. Rev., 27, 01, 1-31 (2012)
[59] Bowling, M.; Veloso, M., Simultaneous adversarial multi-robot learning, (IJCAI, vol. 3 (2003)), 699-704
[60] Panait, L.; Luke, S., Cooperative multi-agent learning: the state of the art, Auton. Agents Multi-Agent Syst., 11, 3, 387-434 (2005)
[61] Busoniu, L.; Babuska, R.; Schutter, B. D., A comprehensive survey of multiagent reinforcement learning, IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev., 38, 2, 156-172 (2008)
[62] Nowé, A.; Vrancx, P.; Hauwere, Y.-M. D., Game theory and multi-agent reinforcement learning, (Reinforcement Learning: State-of-the-Art (2012), Springer), 441-470, Ch. 14
[63] Bloembergen, D.; Tuyls, K.; Hennes, D.; Kaisers, M., Evolutionary dynamics of multi-agent learning: a survey, J. Artif. Intell. Res., 53, 659-697 (2015) · Zbl 1336.68210
[64] Hernandez-Leal, P.; Kartal, B.; Taylor, M. E., Is multiagent deep reinforcement learning the answer or the question? a brief survey, CoRR
[65] Baxter, J.; Tridgell, A.; Weaver, L., Knightcap: a chess program that learns by combining TD(lambda) with game-tree search, (Proceedings of the 15th International Conference on Machine Learning (1998), Morgan Kaufmann), 28-36
[66] Veness, J.; Silver, D.; Blair, A.; Uther, W., Bootstrapping from game tree search, (Bengio, Y.; Schuurmans, D.; Lafferty, J. D.; Williams, C. K.I.; Culotta, A., Advances in Neural Information Processing Systems, vol. 22 (2009), Curran Associates, Inc.), 1937-1945
[67] Kocsis, L.; Szepesvári, C., Bandit Based Monte-Carlo Planning, 282-293 (2006)
[68] Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; Hassabis, D., Mastering the game of Go with deep neural networks and tree search, Nature, 529, 484-489 (2016)
[69] Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Arthur Guez, T. H.; Baker, L.; Lai, M.; Bolton, A.; Chen, Y.; Lillicrap, T.; Hui, F.; Sifre, L.; van den Driessche, G.; Graepel, T.; Hassabis, D., Mastering the game of Go without human knowledge, Nature, 550, 354-359 (2017)
[70] Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; Lillicrap, T.; Simonyan, K.; Hassabis, D., A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science, 362, 6419, 1140-1144 (2018) · Zbl 1433.68320
[71] Zinkevich, M.; Johanson, M.; Bowling, M.; Piccione, C., Regret minimization in games with incomplete information, (Advances in Neural Information Processing Systems 20 (NIPS 2007) (2008))
[72] Bowling, M.; Burch, N.; Johanson, M.; Tammelin, O., Heads-up Limit Hold’em Poker is solved, Science, 347, 6218, 145-149 (2015)
[73] Lazaridou, A.; Peysakhovich, A.; Baroni, M., Multi-agent cooperation and the emergence of (natural) language, CoRR
[74] Sukhbaatar, S.; Szlam, A.; Fergus, R., Learning multiagent communication with backpropagation, CoRR
[75] de Vries, H.; Strub, F.; Chandar, S.; Pietquin, O.; Larochelle, H.; Courville, A., Guesswhat?! Visual object discovery through multi-modal dialogue, (IEEE Conference on Computer Vision and Pattern Recognition (2017))
[76] Yeh, C.; Lin, H., Automatic bridge bidding using deep reinforcement learning, CoRR
[77] Grice, H. P., Logic and conversation, (Cole, P.; Morgan, J. L., Syntax and Semantics, vol. 3 (1975), Academic Press: Academic Press New York)
[78] Piantadosi, S. T.; Tily, H.; Gibson, E., The communicative function of ambiguity in language, Cognition, 122, 3, 280-291 (2012)
[79] Frank, M. C.; Goodman, N. D., Predicting pragmatic reasoning in language games, Science, 336, 6084 (2012), 998-998 · Zbl 1355.91075
[80] Gmytrasiewicz, P. J.; Doshi, P., A framework for sequential planning in multi-agent settings, J. Artif. Intell. Res., 24, 49-79 (2005) · Zbl 1080.68664
[81] Shoham, Y.; Leyton-Brown, K., Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (2009), Cambridge University Press · Zbl 1163.91006
[82] Yoshida, W.; Dolan, R. J.; Friston, K. J., Game theory of mind, PLoS Comput. Biol., 4, 12, 1-14 (2008)
[83] Wright, J. R.; Leyton-Brown, K., Predicting human behavior in unrepeated, simultaneous-move games, Games Econ. Behav., 106, 16-37 (2017) · Zbl 1414.91099
[84] Hartford, J.; Wright, J. R.; Leyton-Brown, K., Deep learning for predicting human strategic behavior, (Thirtieth Annual Conference on Neural Information Processing Systems (NIPS 2016) (2016))
[85] He, H.; Boyd-Graber, J.; Kwok, K.; III, H. D., Opponent modeling in deep reinforcement learning, (Proceedings of the International Conference on Machine Learning (ICML) (2016))
[86] Albrecht, S. V.; Stone, P., Autonomous agents modelling other agents: a comprehensive survey and open problems, Artif. Intell., 258, 66-95 (2018) · Zbl 1433.68460
[87] Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.; Tuyls, K.; Perolat, J.; Silver, D.; Graepel, T., A unified game-theoretic approach to multiagent reinforcement learning, (Advances in Neural Information Processing Systems (2017))
[88] Foerster, J. N.; Chen, R. Y.; Al-Shedivat, M.; Whiteson, S.; Abbeel, P.; Mordatch, I., Learning with opponent-learning awareness, CoRR
[89] Grover, A.; Al-Shedivat, M.; Gupta, J. K.; Burda, Y.; Edwards, H., Learning policy representations in multiagent systems, (Proceedings of the International Conference on Machine Learning (ICML) (2018))
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.