Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds
Abstract
Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide strong problem-dependent bounds could help illuminate the key features of what makes a RL problem hard and reduce the barrier to using RL algorithms in practice. As a step towards this we derive an algorithm and analysis for finite horizon discrete MDPs with state-of-the-art worst-case regret bounds and substantially tighter bounds if the RL environment has special features but without apriori knowledge of the environment from the algorithm. As a result of our analysis, we also help address an open learning theory question \cite{jiang2018open} about episodic MDPs with a constant upper-bound on the sum of rewards, providing a regret bound function of the number of episodes with no dependence on the horizon.
Cite
Text
Zanette and Brunskill. "Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds." International Conference on Machine Learning, 2019.Markdown
[Zanette and Brunskill. "Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/zanette2019icml-tighter/)BibTeX
@inproceedings{zanette2019icml-tighter,
title = {{Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds}},
author = {Zanette, Andrea and Brunskill, Emma},
booktitle = {International Conference on Machine Learning},
year = {2019},
pages = {7304-7312},
volume = {97},
url = {https://mlanthology.org/icml/2019/zanette2019icml-tighter/}
}