Optimal Best Arm Identification with Fixed Confidence

Abstract

We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

Cite

Text

Garivier and Kaufmann. "Optimal Best Arm Identification with Fixed Confidence." Annual Conference on Computational Learning Theory, 2016.

Markdown

[Garivier and Kaufmann. "Optimal Best Arm Identification with Fixed Confidence." Annual Conference on Computational Learning Theory, 2016.](https://mlanthology.org/colt/2016/garivier2016colt-optimal/)

BibTeX

@inproceedings{garivier2016colt-optimal,
  title     = {{Optimal Best Arm Identification with Fixed Confidence}},
  author    = {Garivier, Aurélien and Kaufmann, Emilie},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2016},
  pages     = {998-1027},
  url       = {https://mlanthology.org/colt/2016/garivier2016colt-optimal/}
}