Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

Abstract

This paper addresses the problem of learning a Nash equilibrium in $\gamma$-discounted multiplayer general-sum Markov Games (MG). A key component of this model is the possibility for the players to either collaborate or team apart to increase their rewards. Building an artificial player for general-sum MGs implies to learn more complex strategies which are impossible to obtain by using techniques developed for two-player zero-sum MGs. In this paper, we introduce a new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games. We prove that minimizing the norm of two Bellman-like residuals implies the convergence to such an $\epsilon$-Nash equilibrium. Then, we show that minimizing an empirical estimate of the $L_p$ norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.

Cite

Text

Pérolat et al. "Learning Nash Equilibrium for General-Sum Markov Games from Batch Data." International Conference on Artificial Intelligence and Statistics, 2017.

Markdown

[Pérolat et al. "Learning Nash Equilibrium for General-Sum Markov Games from Batch Data." International Conference on Artificial Intelligence and Statistics, 2017.](https://mlanthology.org/aistats/2017/perolat2017aistats-learning/)

BibTeX

@inproceedings{perolat2017aistats-learning,
  title     = {{Learning Nash Equilibrium for General-Sum Markov Games from Batch Data}},
  author    = {Pérolat, Julien and Strub, Florian and Piot, Bilal and Pietquin, Olivier},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2017},
  pages     = {232-241},
  url       = {https://mlanthology.org/aistats/2017/perolat2017aistats-learning/}
}