Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Abstract

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [16,34], we show that fast convergence of Q-learning in competitive settings obtains regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

Cite

Text

Leonardos et al. "Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality." Neural Information Processing Systems, 2021.

Markdown

[Leonardos et al. "Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/leonardos2021neurips-explorationexploitation/)

BibTeX

@inproceedings{leonardos2021neurips-explorationexploitation,
  title     = {{Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality}},
  author    = {Leonardos, Stefanos and Piliouras, Georgios and Spendlove, Kelly},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/leonardos2021neurips-explorationexploitation/}
}