A Natural Actor-Critic Framework for Zero-Sum Markov Games

Abstract

We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other’s actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.

Cite

Text

Alacaoglu et al. "A Natural Actor-Critic Framework for Zero-Sum Markov Games." International Conference on Machine Learning, 2022.

Markdown

[Alacaoglu et al. "A Natural Actor-Critic Framework for Zero-Sum Markov Games." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/alacaoglu2022icml-natural/)

BibTeX

@inproceedings{alacaoglu2022icml-natural,
  title     = {{A Natural Actor-Critic Framework for Zero-Sum Markov Games}},
  author    = {Alacaoglu, Ahmet and Viano, Luca and He, Niao and Cevher, Volkan},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {307-366},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/alacaoglu2022icml-natural/}
}