Solving Games with Functional Regret Estimation
Abstract
We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work onabstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.
Cite
Text
Waugh et al. "Solving Games with Functional Regret Estimation." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9445Markdown
[Waugh et al. "Solving Games with Functional Regret Estimation." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/waugh2015aaai-solving/) doi:10.1609/AAAI.V29I1.9445BibTeX
@inproceedings{waugh2015aaai-solving,
title = {{Solving Games with Functional Regret Estimation}},
author = {Waugh, Kevin and Morrill, Dustin and Bagnell, James Andrew and Bowling, Michael H.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2015},
pages = {2138-2145},
doi = {10.1609/AAAI.V29I1.9445},
url = {https://mlanthology.org/aaai/2015/waugh2015aaai-solving/}
}