Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Abstract

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. Focusing on the synchronous setting (such that independent samples for all state-action pairs are queried via a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. To yield an entrywise $\varepsilon$-accurate estimate of the optimal Q-function, state-of-the-art theory requires at least an order of $\frac{|S||A|}{(1-\gamma)^5\varepsilon^{2}}$ samples in the infinite-horizon $\gamma$-discounted setting. In this work, we sharpen the sample complexity of synchronous Q-learning to the order of $\frac{|S||A|}{(1-\gamma)^4\varepsilon^2}$ (up to some logarithmic factor) for any $0<\varepsilon <1$, leading to an order-wise improvement in $\frac{1}{1-\gamma}$. Analogous results are derived for finite-horizon MDPs as well. Notably, our sample complexity analysis unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage. Our result is obtained by identifying novel error decompositions and recursion relations, which might shed light on how to study other variants of Q-learning.

Cite

Text

Li et al. "Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning." International Conference on Machine Learning, 2021.

Markdown

[Li et al. "Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/li2021icml-tightening/)

BibTeX

@inproceedings{li2021icml-tightening,
  title     = {{Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning}},
  author    = {Li, Gen and Cai, Changxiao and Chen, Yuxin and Gu, Yuantao and Wei, Yuting and Chi, Yuejie},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {6296-6306},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/li2021icml-tightening/}
}