Learning Nash Equilibria in Zero-Sum Markov Games: A Single-Timescale Algorithm Under Weak Reachability

Abstract

We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to the opponent's actions and payoffs. Previous works demonstrated convergence to a Nash equilibrium in this setting using double timescale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently with an uncoupled and single-timescale algorithm under weaker conditions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based approach. The algorithm learns an approximate Nash equilibrium in polynomial time, requiring only the existence of a policy pair that induces an irreducible and aperiodic Markov chain, thus considerably weakening past assumptions. Our analysis leverages negative drift inequalities and introduces novel properties of Tsallis entropy that are of independent interest.

Cite

Text

Ouhamma and Kamgarpour. "Learning Nash Equilibria in Zero-Sum Markov Games: A Single-Timescale Algorithm Under Weak Reachability." ICML 2024 Workshops: RLControlTheory, 2024.

Markdown

[Ouhamma and Kamgarpour. "Learning Nash Equilibria in Zero-Sum Markov Games: A Single-Timescale Algorithm Under Weak Reachability." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/ouhamma2024icmlw-learning/)

BibTeX

@inproceedings{ouhamma2024icmlw-learning,
  title     = {{Learning Nash Equilibria in Zero-Sum Markov Games: A Single-Timescale Algorithm Under Weak Reachability}},
  author    = {Ouhamma, Reda and Kamgarpour, Maryam},
  booktitle = {ICML 2024 Workshops: RLControlTheory},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/ouhamma2024icmlw-learning/}
}