Near-Bayesian Exploration in Polynomial Time

Abstract

We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform epsilon-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we are able to achieve lower sample complexity bounds than existing PAC-MDP algorithms, while using exploration strategies that are much greedier than the (extremely cautious) exploration strategies used by these existing algorithms.

Cite

Text

Kolter and Ng. "Near-Bayesian Exploration in Polynomial Time." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553441

Markdown

[Kolter and Ng. "Near-Bayesian Exploration in Polynomial Time." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/kolter2009icml-near/) doi:10.1145/1553374.1553441

BibTeX

@inproceedings{kolter2009icml-near,
  title     = {{Near-Bayesian Exploration in Polynomial Time}},
  author    = {Kolter, J. Zico and Ng, Andrew Y.},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {513-520},
  doi       = {10.1145/1553374.1553441},
  url       = {https://mlanthology.org/icml/2009/kolter2009icml-near/}
}