A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
Abstract
There has been a resurgence of interest in multiagent reinforcement learning (MARL), due partly to the recent success of deep neural networks. The simplest form of MARL is independent reinforcement learning (InRL), where each agent treats all of its experience as part of its (non stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe a meta-algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game theoretic analysis to compute meta-strategies for policy selection. The meta-algorithm generalizes previous algorithms such as InRL, iterated best response, double oracle, and fictitious play. Then, we propose a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in three partially observable settings: gridworld coordination problems, emergent language games, and poker.
Cite
Text
Lanctot et al. "A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning." Neural Information Processing Systems, 2017.Markdown
[Lanctot et al. "A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/lanctot2017neurips-unified/)BibTeX
@inproceedings{lanctot2017neurips-unified,
title = {{A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning}},
author = {Lanctot, Marc and Zambaldi, Vinicius and Gruslys, Audrunas and Lazaridou, Angeliki and Tuyls, Karl and Perolat, Julien and Silver, David and Graepel, Thore},
booktitle = {Neural Information Processing Systems},
year = {2017},
pages = {4190-4203},
url = {https://mlanthology.org/neurips/2017/lanctot2017neurips-unified/}
}