Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information

Abstract

Posterior sampling for reinforcement learning (PSRL) is a useful framework for making decisions in an unknown environment. PSRL maintains a posterior distribution of the environment and then makes planning on the environment sampled from the posterior distribution. Though PSRL works well on single-agent reinforcement learning problems, how to apply PSRL to multi-agent reinforcement learning problems is relatively unexplored. In this work, we extend PSRL to two-player zero-sum extensive-games with imperfect information (TEGI), which is a class of multi-agent systems. More specifically, we combine PSRL with counterfactual regret minimization (CFR), which is the leading algorithm for TEGI with a known environment. Our main contribution is a novel design of interaction strategies. With our interaction strategies, our algorithm provably converges to the Nash Equilibrium at a rate of $O(\sqrt{\log T/T})$. Empirical results show that our algorithm works well.

Cite

Text

Zhou et al. "Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information." International Conference on Learning Representations, 2020.

Markdown

[Zhou et al. "Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/zhou2020iclr-posterior/)

BibTeX

@inproceedings{zhou2020iclr-posterior,
  title     = {{Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games with Imperfect Information}},
  author    = {Zhou, Yichi and Li, Jialian and Zhu, Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/zhou2020iclr-posterior/}
}