Minimax Policies for Adversarial and Stochastic Bandits

Audibert, Jean-Yves; Bubeck, Sébastien

Minimax Policies for Adversarial and Stochastic Bandits

COLT 2009

/colt/2009/audibert2009colt-minimax/

Abstract

We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit problem. Concretely, we remove an extraneous logarithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.

PDF Semantic Scholar

Cite

Text

Audibert and Bubeck. "Minimax Policies for Adversarial and Stochastic Bandits." Annual Conference on Computational Learning Theory, 2009.

Markdown

[Audibert and Bubeck. "Minimax Policies for Adversarial and Stochastic Bandits." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/audibert2009colt-minimax/)

BibTeX

@inproceedings{audibert2009colt-minimax,
  title     = {{Minimax Policies for Adversarial and Stochastic Bandits}},
  author    = {Audibert, Jean-Yves and Bubeck, Sébastien},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2009},
  url       = {https://mlanthology.org/colt/2009/audibert2009colt-minimax/}
}