Near-Bayesian Exploration in Polynomial Time

Kolter, J. Zico; Ng, Andrew Y.

doi:10.1145/1553374.1553441

Near-Bayesian Exploration in Polynomial Time

J. Zico Kolter, Andrew Y. Ng

ICML 2009 pp. 513-520

doi:10.1145/1553374.1553441 /icml/2009/kolter2009icml-near/

Abstract

We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform epsilon-close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PAC-MDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we are able to achieve lower sample complexity bounds than existing PAC-MDP algorithms, while using exploration strategies that are much greedier than the (extremely cautious) exploration strategies used by these existing algorithms.

PDF ICML Semantic Scholar

Cite

Text

Kolter and Ng. "Near-Bayesian Exploration in Polynomial Time." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553441

Markdown

[Kolter and Ng. "Near-Bayesian Exploration in Polynomial Time." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/kolter2009icml-near/) doi:10.1145/1553374.1553441

BibTeX

@inproceedings{kolter2009icml-near,
  title     = {{Near-Bayesian Exploration in Polynomial Time}},
  author    = {Kolter, J. Zico and Ng, Andrew Y.},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {513-520},
  doi       = {10.1145/1553374.1553441},
  url       = {https://mlanthology.org/icml/2009/kolter2009icml-near/}
}