Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Abstract

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds after $T$ steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{T})$.

Cite

Text

Qiu et al. "Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions." International Conference on Machine Learning, 2021.

Markdown

[Qiu et al. "Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/qiu2021icml-provably/)

BibTeX

@inproceedings{qiu2021icml-provably,
  title     = {{Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions}},
  author    = {Qiu, Shuang and Wei, Xiaohan and Ye, Jieping and Wang, Zhaoran and Yang, Zhuoran},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {8715-8725},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/qiu2021icml-provably/}
}