Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Abstract

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

Cite

Text

Zhao et al. "Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games." Artificial Intelligence and Statistics, 2022.

Markdown

[Zhao et al. "Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/zhao2022aistats-provably/)

BibTeX

@inproceedings{zhao2022aistats-provably,
  title     = {{Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games}},
  author    = {Zhao, Yulai and Tian, Yuandong and Lee, Jason and Du, Simon},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2022},
  pages     = {2736-2761},
  volume    = {151},
  url       = {https://mlanthology.org/aistats/2022/zhao2022aistats-provably/}
}