Reinforcement Learning for Average Reward Zero-Sum Games

Mannor, Shie

doi:10.1007/978-3-540-27819-1_4

Reinforcement Learning for Average Reward Zero-Sum Games

Shie Mannor

COLT 2004 pp. 49-63

doi:10.1007/978-3-540-27819-1_4 /colt/2004/mannor2004colt-reinforcement/

Abstract

We consider Reinforcement Learning for average reward zero-sum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparable frequencies and present an algorithm that converges to the optimal Q-function, given the observed play of the opponent.

PDF COLT Semantic Scholar

Cite

Text

Mannor. "Reinforcement Learning for Average Reward Zero-Sum Games." Annual Conference on Computational Learning Theory, 2004. doi:10.1007/978-3-540-27819-1_4

Markdown

[Mannor. "Reinforcement Learning for Average Reward Zero-Sum Games." Annual Conference on Computational Learning Theory, 2004.](https://mlanthology.org/colt/2004/mannor2004colt-reinforcement/) doi:10.1007/978-3-540-27819-1_4

BibTeX

@inproceedings{mannor2004colt-reinforcement,
  title     = {{Reinforcement Learning for Average Reward Zero-Sum Games}},
  author    = {Mannor, Shie},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2004},
  pages     = {49-63},
  doi       = {10.1007/978-3-540-27819-1_4},
  url       = {https://mlanthology.org/colt/2004/mannor2004colt-reinforcement/}
}