Towards Finite-Sample Convergence of Direct Reinforcement Learning

Lim, Shiau Hong; DeJong, Gerald

doi:10.1007/11564096_25

Towards Finite-Sample Convergence of Direct Reinforcement Learning

Shiau Hong Lim, Gerald DeJong

ECML-PKDD 2005 pp. 230-241

doi:10.1007/11564096_25 /ecmlpkdd/2005/lim2005ecml-finitesample/

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Lim and DeJong. "Towards Finite-Sample Convergence of Direct Reinforcement Learning." European Conference on Machine Learning, 2005. doi:10.1007/11564096_25

Markdown

[Lim and DeJong. "Towards Finite-Sample Convergence of Direct Reinforcement Learning." European Conference on Machine Learning, 2005.](https://mlanthology.org/ecmlpkdd/2005/lim2005ecml-finitesample/) doi:10.1007/11564096_25

BibTeX

@inproceedings{lim2005ecml-finitesample,
  title     = {{Towards Finite-Sample Convergence of Direct Reinforcement Learning}},
  author    = {Lim, Shiau Hong and DeJong, Gerald},
  booktitle = {European Conference on Machine Learning},
  year      = {2005},
  pages     = {230-241},
  doi       = {10.1007/11564096_25},
  url       = {https://mlanthology.org/ecmlpkdd/2005/lim2005ecml-finitesample/}
}