Minimax-Optimal Multi-Agent RL in Markov Games with a Generative Model

Abstract

This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an $\varepsilon$-approximate CCE in a general-sum Markov game using $ \widetilde{O}\bigg( \frac{H^4 S \sum_{i=1}^m A_i}{\varepsilon^2} \bigg) $ samples, where $m$ is the number of players, $S$ indicates the number of states, $H$ is the horizon, and $A_i$ denotes the number of actions for the $i$-th player. This is minimax-optimal (up to log factor) when $m$ is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an $\varepsilon$-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.

Cite

Text

Li et al. "Minimax-Optimal Multi-Agent RL in Markov Games with a Generative Model." Neural Information Processing Systems, 2022.

Markdown

[Li et al. "Minimax-Optimal Multi-Agent RL in Markov Games with a Generative Model." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/li2022neurips-minimaxoptimal/)

BibTeX

@inproceedings{li2022neurips-minimaxoptimal,
  title     = {{Minimax-Optimal Multi-Agent RL in Markov Games with a Generative Model}},
  author    = {Li, Gen and Chi, Yuejie and Wei, Yuting and Chen, Yuxin},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/li2022neurips-minimaxoptimal/}
}