The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information

Abstract

The article describes a gradient search based reinforcement learning algorithm for two-player zero-sum games with imperfect information. Simple gradient search may result in oscillation around solution points, a problem similar to the “Crawford puzzle”. To dampen oscillations, the algorithm uses lagging anchors, drawing the strategy state of the players toward a weighted average of earlier strategy states. The algorithm is applicable to games represented in extensive form. We develop methods for sampling the parameter gradient of a player's performance against an opponent, using temporal-difference learning. The algorithm is used successfully for a simplified poker game with infinite sets of pure strategies, and for the air combat game Campaign, using neural nets. We prove exponential convergence of the algorithm for a subset of matrix games.

Cite

Text

Dahl. "The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information." Machine Learning, 2002. doi:10.1023/A:1014063505958

Markdown

[Dahl. "The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information." Machine Learning, 2002.](https://mlanthology.org/mlj/2002/dahl2002mlj-lagging/) doi:10.1023/A:1014063505958

BibTeX

@article{dahl2002mlj-lagging,
  title     = {{The Lagging Anchor Algorithm: Reinforcement Learning in Two-Player Zero-Sum Games with Imperfect Information}},
  author    = {Dahl, Fredrik A.},
  journal   = {Machine Learning},
  year      = {2002},
  pages     = {5-37},
  doi       = {10.1023/A:1014063505958},
  volume    = {49},
  url       = {https://mlanthology.org/mlj/2002/dahl2002mlj-lagging/}
}