Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Brown, Noam; Bakhtin, Anton; Lerer, Adam; Gong, Qucheng

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

NeurIPS 2020

/neurips/2020/brown2020neurips-combining/

Abstract

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

PDF NeurIPS Semantic Scholar

Cite

Text

Brown et al. "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games." Neural Information Processing Systems, 2020.

Markdown

[Brown et al. "Combining Deep Reinforcement Learning and Search for Imperfect-Information Games." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/brown2020neurips-combining/)

BibTeX

@inproceedings{brown2020neurips-combining,
  title     = {{Combining Deep Reinforcement Learning and Search for Imperfect-Information Games}},
  author    = {Brown, Noam and Bakhtin, Anton and Lerer, Adam and Gong, Qucheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/brown2020neurips-combining/}
}