Bayesian Reinforcement Learning with Exploration

Abstract

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

Cite

Text

Lattimore and Hutter. "Bayesian Reinforcement Learning with Exploration." International Conference on Algorithmic Learning Theory, 2014. doi:10.1007/978-3-319-11662-4_13

Markdown

[Lattimore and Hutter. "Bayesian Reinforcement Learning with Exploration." International Conference on Algorithmic Learning Theory, 2014.](https://mlanthology.org/alt/2014/lattimore2014alt-bayesian/) doi:10.1007/978-3-319-11662-4_13

BibTeX

@inproceedings{lattimore2014alt-bayesian,
  title     = {{Bayesian Reinforcement Learning with Exploration}},
  author    = {Lattimore, Tor and Hutter, Marcus},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {2014},
  pages     = {170-184},
  doi       = {10.1007/978-3-319-11662-4_13},
  url       = {https://mlanthology.org/alt/2014/lattimore2014alt-bayesian/}
}