A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

NeurIPS 2023

/neurips/2023/chen2023neurips-finitesample/

Abstract

In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players. Our theoretical results present to the best of our knowledge the first last-iterate finite-sample analysis of such independent learning dynamics. To establish the results, we develop a coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Chen et al. "A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games." Neural Information Processing Systems, 2023.

Markdown

[Chen et al. "A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/chen2023neurips-finitesample/)

BibTeX

@inproceedings{chen2023neurips-finitesample,
  title     = {{A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games}},
  author    = {Chen, Zaiwei and Zhang, Kaiqing and Mazumdar, Eric and Ozdaglar, Asuman and Wierman, Adam},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/chen2023neurips-finitesample/}
}