Meta-Thompson Sampling

Abstract

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.

Cite

Text

Kveton et al. "Meta-Thompson Sampling." International Conference on Machine Learning, 2021.

Markdown

[Kveton et al. "Meta-Thompson Sampling." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/kveton2021icml-metathompson/)

BibTeX

@inproceedings{kveton2021icml-metathompson,
  title     = {{Meta-Thompson Sampling}},
  author    = {Kveton, Branislav and Konobeev, Mikhail and Zaheer, Manzil and Hsu, Chih-Wei and Mladenov, Martin and Boutilier, Craig and Szepesvari, Csaba},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {5884-5893},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/kveton2021icml-metathompson/}
}