Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Zimmert, Julian; Lattimore, Tor

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

NeurIPS 2019 pp. 11973-11982

/neurips/2019/zimmert2019neurips-connections/

Abstract

The information-theoretic analysis by Russo and Van Roy [2014] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications are derived from existing techniques from online convex optimisation. Besides this, we improve best known regret guarantees for $k$-armed adversarial bandits, online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

PDF NeurIPS Semantic Scholar

Cite

Text

Zimmert and Lattimore. "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio." Neural Information Processing Systems, 2019.

Markdown

[Zimmert and Lattimore. "Connections Between Mirror Descent, Thompson Sampling and the Information Ratio." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/zimmert2019neurips-connections/)

BibTeX

@inproceedings{zimmert2019neurips-connections,
  title     = {{Connections Between Mirror Descent, Thompson Sampling and the Information Ratio}},
  author    = {Zimmert, Julian and Lattimore, Tor},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {11973-11982},
  url       = {https://mlanthology.org/neurips/2019/zimmert2019neurips-connections/}
}