Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Abstract

The information-theoretic analysis by Russo and Van Roy [2014] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications are derived from existing techniques from online convex optimisation. Besides this, we improve best known regret guarantees for $k$-armed adversarial bandits, online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

Cite

Text

Zimmert and Lattimore. "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio." Neural Information Processing Systems, 2019.

Markdown

[Zimmert and Lattimore. "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/zimmert2019neurips-connections/)

BibTeX

@inproceedings{zimmert2019neurips-connections,
  title     = {{Connections Between Mirror Descent, Thompson Sampling and the Information Ratio}},
  author    = {Zimmert, Julian and Lattimore, Tor},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {11973-11982},
  url       = {https://mlanthology.org/neurips/2019/zimmert2019neurips-connections/}
}