The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information
Abstract
The article describes a gradient search based reinforcement learning algorithm for two-player zero-sum games with imperfect information. Simple gradient search may result in oscillation around solution points, a problem similar to the “Crawford puzzle”. To dampen oscillations, the algorithm uses lagging anchors, drawing the strategy state of the players toward a weighted average of earlier strategy states. The algorithm is applicable to games represented in extensive form. We develop methods for sampling the parameter gradient of a player's performance against an opponent, using temporal-difference learning. The algorithm is used successfully for a simplified poker game with infinite sets of pure strategies, and for the air combat game Campaign, using neural nets. We prove exponential convergence of the algorithm for a subset of matrix games.
Cite
Text
Dahl. "The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information." Machine Learning, 2002. doi:10.1023/A:1014063505958Markdown
[Dahl. "The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information." Machine Learning, 2002.](https://mlanthology.org/mlj/2002/dahl2002mlj-lagging/) doi:10.1023/A:1014063505958BibTeX
@article{dahl2002mlj-lagging,
title = {{The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information}},
author = {Dahl, Fredrik A.},
journal = {Machine Learning},
year = {2002},
pages = {5-37},
doi = {10.1023/A:1014063505958},
volume = {49},
url = {https://mlanthology.org/mlj/2002/dahl2002mlj-lagging/}
}