Convergence and Divergence in Standard and Averaging Reinforcement Learning

Wiering, Marco A.

doi:10.1007/978-3-540-30115-8_44

Convergence and Divergence in Standard and Averaging Reinforcement Learning

Marco A. Wiering

ECML-PKDD 2004 pp. 477-488

doi:10.1007/978-3-540-30115-8_44 /ecmlpkdd/2004/wiering2004ecml-convergence/

Abstract

Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques with function approximators can lead to divergence. In this paper we show why off-policy RL methods combined with linear function approximators can lead to divergence. Furthermore, we analyze two different types of updates; standard and averaging RL updates. Although averaging RL will not diverge, we show that they can converge to wrong value functions. In our experiments we compare standard to averaging value iteration (VI) with CMACs and the results show that for small values of the discount factor averaging VI works better, whereas for large values of the discount factor standard VI performs better, although it does not always converge.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Wiering. "Convergence and Divergence in Standard and Averaging Reinforcement Learning." European Conference on Machine Learning, 2004. doi:10.1007/978-3-540-30115-8_44

Markdown

[Wiering. "Convergence and Divergence in Standard and Averaging Reinforcement Learning." European Conference on Machine Learning, 2004.](https://mlanthology.org/ecmlpkdd/2004/wiering2004ecml-convergence/) doi:10.1007/978-3-540-30115-8_44

BibTeX

@inproceedings{wiering2004ecml-convergence,
  title     = {{Convergence and Divergence in Standard and Averaging Reinforcement Learning}},
  author    = {Wiering, Marco A.},
  booktitle = {European Conference on Machine Learning},
  year      = {2004},
  pages     = {477-488},
  doi       = {10.1007/978-3-540-30115-8_44},
  url       = {https://mlanthology.org/ecmlpkdd/2004/wiering2004ecml-convergence/}
}