An Information-Theoretic Analysis of Thompson Sampling

Abstract

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.

Cite

Text

Russo and Van Roy. "An Information-Theoretic Analysis of Thompson Sampling." Journal of Machine Learning Research, 2016.

Markdown

[Russo and Van Roy. "An Information-Theoretic Analysis of Thompson Sampling." Journal of Machine Learning Research, 2016.](https://mlanthology.org/jmlr/2016/russo2016jmlr-informationtheoretic/)

BibTeX

@article{russo2016jmlr-informationtheoretic,
  title     = {{An Information-Theoretic Analysis of Thompson Sampling}},
  author    = {Russo, Daniel and Van Roy, Benjamin},
  journal   = {Journal of Machine Learning Research},
  year      = {2016},
  pages     = {1-30},
  volume    = {17},
  url       = {https://mlanthology.org/jmlr/2016/russo2016jmlr-informationtheoretic/}
}