Learning Rates for Q-Learning

Abstract

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/ t ^ω at time t where ω ε (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/ t at time t , the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.

Cite

Text

Even-Dar and Mansour. "Learning Rates for Q-Learning." Annual Conference on Computational Learning Theory, 2001. doi:10.1007/3-540-44581-1_39

Markdown

[Even-Dar and Mansour. "Learning Rates for Q-Learning." Annual Conference on Computational Learning Theory, 2001.](https://mlanthology.org/colt/2001/evendar2001colt-learning/) doi:10.1007/3-540-44581-1_39

BibTeX

@inproceedings{evendar2001colt-learning,
  title     = {{Learning Rates for Q-Learning}},
  author    = {Even-Dar, Eyal and Mansour, Yishay},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2001},
  pages     = {589-604},
  doi       = {10.1007/3-540-44581-1_39},
  url       = {https://mlanthology.org/colt/2001/evendar2001colt-learning/}
}