An Information-Theoretic Analysis of Thompson Sampling
Abstract
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
Cite
Text
Russo and Van Roy. "An Information-Theoretic Analysis of Thompson Sampling." Journal of Machine Learning Research, 2016.Markdown
[Russo and Van Roy. "An Information-Theoretic Analysis of Thompson Sampling." Journal of Machine Learning Research, 2016.](https://mlanthology.org/jmlr/2016/russo2016jmlr-informationtheoretic/)BibTeX
@article{russo2016jmlr-informationtheoretic,
title = {{An Information-Theoretic Analysis of Thompson Sampling}},
author = {Russo, Daniel and Van Roy, Benjamin},
journal = {Journal of Machine Learning Research},
year = {2016},
pages = {1-30},
volume = {17},
url = {https://mlanthology.org/jmlr/2016/russo2016jmlr-informationtheoretic/}
}