Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Abstract

Multiagent learning is a key problem in AI. In the presence of multi- ple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exac- cerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two inter- related problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the first algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.

Cite

Text

Wang and Sandholm. "Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games." Neural Information Processing Systems, 2002.

Markdown

[Wang and Sandholm. "Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/wang2002neurips-reinforcement/)

BibTeX

@inproceedings{wang2002neurips-reinforcement,
  title     = {{Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games}},
  author    = {Wang, Xiaofeng and Sandholm, Tuomas},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1603-1610},
  url       = {https://mlanthology.org/neurips/2002/wang2002neurips-reinforcement/}
}