Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
Abstract
Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff.
Cite
Text
Burch et al. "Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions." Neural Information Processing Systems, 2012.Markdown
[Burch et al. "Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/burch2012neurips-efficient/)BibTeX
@inproceedings{burch2012neurips-efficient,
title = {{Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions}},
author = {Burch, Neil and Lanctot, Marc and Szafron, Duane and Gibson, Richard G.},
booktitle = {Neural Information Processing Systems},
year = {2012},
pages = {1880-1888},
url = {https://mlanthology.org/neurips/2012/burch2012neurips-efficient/}
}