Exploration Analysis in Finite-Horizon Turn-Based Stochastic Games
Abstract
Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the $N$-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, \textit{Upper Bounding the Values for Players} (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore \textit{in the face of uncertainty} and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is \textit{Uniform}-PAC with a sample complexity of $\tilde{O}(1/\epsilon^2)$ to get an $\epsilon$-Nash Equilibrium for arbitrary $\epsilon>0$, and the other has a cumulative exploitability of $\tilde{O}(\sqrt{T})$ with high probability.
Cite
Text
Li et al. "Exploration Analysis in Finite-Horizon Turn-Based Stochastic Games." Uncertainty in Artificial Intelligence, 2020.Markdown
[Li et al. "Exploration Analysis in Finite-Horizon Turn-Based Stochastic Games." Uncertainty in Artificial Intelligence, 2020.](https://mlanthology.org/uai/2020/li2020uai-exploration/)BibTeX
@inproceedings{li2020uai-exploration,
title = {{Exploration Analysis in Finite-Horizon Turn-Based Stochastic Games}},
author = {Li, Jialian and Zhou, Yichi and Ren, Tongzheng and Zhu, Jun},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2020},
pages = {201-210},
volume = {124},
url = {https://mlanthology.org/uai/2020/li2020uai-exploration/}
}