Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Abstract

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

Cite

Text

Brown et al. "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games." Neural Information Processing Systems, 2020.

Markdown

[Brown et al. "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/brown2020neurips-combining/)

BibTeX

@inproceedings{brown2020neurips-combining,
  title     = {{Combining Deep Reinforcement Learning and Search for Imperfect-Information Games}},
  author    = {Brown, Noam and Bakhtin, Anton and Lerer, Adam and Gong, Qucheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/brown2020neurips-combining/}
}