Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
Abstract
This paper addresses the problem of learning a Nash equilibrium in $\gamma$-discounted multiplayer general-sum Markov Games (MG). A key component of this model is the possibility for the players to either collaborate or team apart to increase their rewards. Building an artificial player for general-sum MGs implies to learn more complex strategies which are impossible to obtain by using techniques developed for two-player zero-sum MGs. In this paper, we introduce a new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games. We prove that minimizing the norm of two Bellman-like residuals implies the convergence to such an $\epsilon$-Nash equilibrium. Then, we show that minimizing an empirical estimate of the $L_p$ norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.
Cite
Text
Pérolat et al. "Learning Nash Equilibrium for General-Sum Markov Games from Batch Data." International Conference on Artificial Intelligence and Statistics, 2017.Markdown
[Pérolat et al. "Learning Nash Equilibrium for General-Sum Markov Games from Batch Data." International Conference on Artificial Intelligence and Statistics, 2017.](https://mlanthology.org/aistats/2017/perolat2017aistats-learning/)BibTeX
@inproceedings{perolat2017aistats-learning,
title = {{Learning Nash Equilibrium for General-Sum Markov Games from Batch Data}},
author = {Pérolat, Julien and Strub, Florian and Piot, Bilal and Pietquin, Olivier},
booktitle = {International Conference on Artificial Intelligence and Statistics},
year = {2017},
pages = {232-241},
url = {https://mlanthology.org/aistats/2017/perolat2017aistats-learning/}
}