The Asymptotic Convergence-Rate of Q-Learning
Abstract
In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning if R(1 - ,) < 1/2 and O( Jlog log tit) otherwise is O(1/tR (1-1') provided that the state-action pairs are sampled from a fixed prob(cid:173) ability distribution. Here R = Pmin/Pmax is the ratio of the min(cid:173) imum and maximum state-action occupation frequencies. The re(cid:173) sults extend to convergent on-line learning provided that Pmin > 0, where Pmin and Pmax now become the minimum and maximum state-action occupation frequencies corresponding to the station(cid:173) ary distribution.
Cite
Text
Szepesvári. "The Asymptotic Convergence-Rate of Q-Learning." Neural Information Processing Systems, 1997.Markdown
[Szepesvári. "The Asymptotic Convergence-Rate of Q-Learning." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/szepesvari1997neurips-asymptotic/)BibTeX
@inproceedings{szepesvari1997neurips-asymptotic,
title = {{The Asymptotic Convergence-Rate of Q-Learning}},
author = {Szepesvári, Csaba},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {1064-1070},
url = {https://mlanthology.org/neurips/1997/szepesvari1997neurips-asymptotic/}
}