Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Abstract

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.

Cite

Text

Perolat et al. "Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games." International Conference on Machine Learning, 2015.

Markdown

[Perolat et al. "Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games." International Conference on Machine Learning, 2015.](https://mlanthology.org/icml/2015/perolat2015icml-approximate/)

BibTeX

@inproceedings{perolat2015icml-approximate,
  title     = {{Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games}},
  author    = {Perolat, Julien and Scherrer, Bruno and Piot, Bilal and Pietquin, Olivier},
  booktitle = {International Conference on Machine Learning},
  year      = {2015},
  pages     = {1321-1329},
  volume    = {37},
  url       = {https://mlanthology.org/icml/2015/perolat2015icml-approximate/}
}