A Minimax and Asymptotically Optimal Algorithm for Stochastic Bandits

Abstract

We propose the $\text{kl-UCB}^{++}$ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

Cite

Text

Ménard and Garivier. "A Minimax and Asymptotically Optimal Algorithm for Stochastic Bandits." Proceedings of the 28th International Conference on Algorithmic Learning Theory, 2017.

Markdown

[Ménard and Garivier. "A Minimax and Asymptotically Optimal Algorithm for Stochastic Bandits." Proceedings of the 28th International Conference on Algorithmic Learning Theory, 2017.](https://mlanthology.org/alt/2017/menard2017alt-minimax/)

BibTeX

@inproceedings{menard2017alt-minimax,
  title     = {{A Minimax and Asymptotically Optimal Algorithm for Stochastic Bandits}},
  author    = {Ménard, Pierre and Garivier, Aurélien},
  booktitle = {Proceedings of the 28th International Conference on Algorithmic Learning Theory},
  year      = {2017},
  pages     = {223-237},
  volume    = {76},
  url       = {https://mlanthology.org/alt/2017/menard2017alt-minimax/}
}