Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Abstract

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We present a refined analysis of the information ratio, and show an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how our results improve the state-of-the-art.

Cite

Text

Moradipari et al. "Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning." Neural Information Processing Systems, 2023.

Markdown

[Moradipari et al. "Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/moradipari2023neurips-improved/)

BibTeX

@inproceedings{moradipari2023neurips-improved,
  title     = {{Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning}},
  author    = {Moradipari, Ahmadreza and Pedramfar, Mohammad and Zini, Modjtaba Shokrian and Aggarwal, Vaneet},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/moradipari2023neurips-improved/}
}