Further Optimal Regret Bounds for Thompson Sampling

Abstract

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O( √ NT lnN) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω( √ NT lnN) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O( √ NT ) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ ǫ) ∑ i lnT d(μi,μ1) +O(Nǫ2 ) on the expected regret. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012b]. Appearing in Proceedings of the 16 International Conference on Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, USA. Volume 31 of JMLR: W&CP 31. Copyright 2013 by the authors.

Cite

Text

Agrawal and Goyal. "Further Optimal Regret Bounds for Thompson Sampling." International Conference on Artificial Intelligence and Statistics, 2013.

Markdown

[Agrawal and Goyal. "Further Optimal Regret Bounds for Thompson Sampling." International Conference on Artificial Intelligence and Statistics, 2013.](https://mlanthology.org/aistats/2013/agrawal2013aistats-further/)

BibTeX

@inproceedings{agrawal2013aistats-further,
  title     = {{Further Optimal Regret Bounds for Thompson Sampling}},
  author    = {Agrawal, Shipra and Goyal, Navin},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2013},
  pages     = {99-107},
  url       = {https://mlanthology.org/aistats/2013/agrawal2013aistats-further/}
}