A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Abstract

In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players. Our theoretical results present to the best of our knowledge the first last-iterate finite-sample analysis of such independent learning dynamics. To establish the results, we develop a coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

Cite

Text

Chen et al. "A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games." Neural Information Processing Systems, 2023.

Markdown

[Chen et al. "A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/chen2023neurips-finitesample/)

BibTeX

@inproceedings{chen2023neurips-finitesample,
  title     = {{A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games}},
  author    = {Chen, Zaiwei and Zhang, Kaiqing and Mazumdar, Eric and Ozdaglar, Asuman and Wierman, Adam},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/chen2023neurips-finitesample/}
}