Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
Abstract
In this paper we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular \cite{sidford2018near}, to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.
Cite
Text
Sidford et al. "Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity." Artificial Intelligence and Statistics, 2020.Markdown
[Sidford et al. "Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/sidford2020aistats-solving/)BibTeX
@inproceedings{sidford2020aistats-solving,
title = {{Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity}},
author = {Sidford, Aaron and Wang, Mengdi and Yang, Lin and Ye, Yinyu},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {2992-3002},
volume = {108},
url = {https://mlanthology.org/aistats/2020/sidford2020aistats-solving/}
}