Bayesian Optimistic Kullback-Leibler Exploration

Lee, Kanghoon; Kim, Geon-hyeong; Ortega, Pedro A.; Lee, Daniel D.; Kim, Kee-Eung

doi:10.1007/S10994-018-5767-4

Bayesian Optimistic Kullback-Leibler Exploration

Kanghoon Lee, Geon-hyeong Kim, Pedro A. Ortega, Daniel D. Lee, Kee-Eung Kim

MLJ 2019 pp. 765-783

doi:10.1007/S10994-018-5767-4 /mlj/2019/lee2019mlj-bayesian/

Abstract

We consider a Bayesian approach to model-based reinforcement learning, where the agent uses a distribution of environment models to find the action that optimally trades off exploration and exploitation. Unfortunately, it is intractable to find the Bayes-optimal solution to the problem except for restricted cases. In this paper, we present BOKLE, a simple algorithm that uses Kullback–Leibler divergence to constrain the set of plausible models for guiding the exploration. We provide a formal analysis that this algorithm is near Bayes-optimal with high probability. We also show an asymptotic relation between the solution pursued by BOKLE and a well-known algorithm called Bayesian exploration bonus. Finally, we show experimental results that clearly demonstrate the exploration efficiency of the algorithm.

PDF MLJ Semantic Scholar

Cite

Text

Lee et al. "Bayesian Optimistic Kullback-Leibler Exploration." Machine Learning, 2019. doi:10.1007/S10994-018-5767-4

Markdown

[Lee et al. "Bayesian Optimistic Kullback-Leibler Exploration." Machine Learning, 2019.](https://mlanthology.org/mlj/2019/lee2019mlj-bayesian/) doi:10.1007/S10994-018-5767-4

BibTeX

@article{lee2019mlj-bayesian,
  title     = {{Bayesian Optimistic Kullback-Leibler Exploration}},
  author    = {Lee, Kanghoon and Kim, Geon-hyeong and Ortega, Pedro A. and Lee, Daniel D. and Kim, Kee-Eung},
  journal   = {Machine Learning},
  year      = {2019},
  pages     = {765-783},
  doi       = {10.1007/S10994-018-5767-4},
  volume    = {108},
  url       = {https://mlanthology.org/mlj/2019/lee2019mlj-bayesian/}
}