Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

NeurIPS 2012 pp. 1880-1888

/neurips/2012/burch2012neurips-efficient/

Abstract

Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff.

PDF NeurIPS Semantic Scholar

Cite

Text

Burch et al. "Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions." Neural Information Processing Systems, 2012.

Markdown

[Burch et al. "Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/burch2012neurips-efficient/)

BibTeX

@inproceedings{burch2012neurips-efficient,
  title     = {{Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions}},
  author    = {Burch, Neil and Lanctot, Marc and Szafron, Duane and Gibson, Richard G.},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {1880-1888},
  url       = {https://mlanthology.org/neurips/2012/burch2012neurips-efficient/}
}