Towards Finite-Sample Convergence of Direct Reinforcement Learning
Abstract
While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.
Cite
Text
Lim and DeJong. "Towards Finite-Sample Convergence of Direct Reinforcement Learning." European Conference on Machine Learning, 2005. doi:10.1007/11564096_25Markdown
[Lim and DeJong. "Towards Finite-Sample Convergence of Direct Reinforcement Learning." European Conference on Machine Learning, 2005.](https://mlanthology.org/ecmlpkdd/2005/lim2005ecml-finitesample/) doi:10.1007/11564096_25BibTeX
@inproceedings{lim2005ecml-finitesample,
title = {{Towards Finite-Sample Convergence of Direct Reinforcement Learning}},
author = {Lim, Shiau Hong and DeJong, Gerald},
booktitle = {European Conference on Machine Learning},
year = {2005},
pages = {230-241},
doi = {10.1007/11564096_25},
url = {https://mlanthology.org/ecmlpkdd/2005/lim2005ecml-finitesample/}
}