Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds

Abstract

Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide strong problem-dependent bounds could help illuminate the key features of what makes a RL problem hard and reduce the barrier to using RL algorithms in practice. As a step towards this we derive an algorithm and analysis for finite horizon discrete MDPs with state-of-the-art worst-case regret bounds and substantially tighter bounds if the RL environment has special features but without apriori knowledge of the environment from the algorithm. As a result of our analysis, we also help address an open learning theory question \cite{jiang2018open} about episodic MDPs with a constant upper-bound on the sum of rewards, providing a regret bound function of the number of episodes with no dependence on the horizon.

Cite

Text

Zanette and Brunskill. "Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds." International Conference on Machine Learning, 2019.

Markdown

[Zanette and Brunskill. "Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/zanette2019icml-tighter/)

BibTeX

@inproceedings{zanette2019icml-tighter,
  title     = {{Tighter Problem-Dependent Regret Bounds in Reinforcement Learning Without Domain Knowledge Using Value Function Bounds}},
  author    = {Zanette, Andrea and Brunskill, Emma},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {7304-7312},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/zanette2019icml-tighter/}
}