Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

AISTATS 2020 pp. 2992-3002

/aistats/2020/sidford2020aistats-solving/

Abstract

In this paper we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular \cite{sidford2018near}, to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.

PDF AISTATS Semantic Scholar

Cite

Text

Sidford et al. "Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity." Artificial Intelligence and Statistics, 2020.

Markdown

[Sidford et al. "Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/sidford2020aistats-solving/)

BibTeX

@inproceedings{sidford2020aistats-solving,
  title     = {{Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity}},
  author    = {Sidford, Aaron and Wang, Mengdi and Yang, Lin and Ye, Yinyu},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {2992-3002},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/sidford2020aistats-solving/}
}