Almost Optimal Exploration in Multi-Armed Bandits

Abstract

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given a target budget of arm pulls, for which we prove upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters. We corroborate our theoretical results with experiments demonstrating that our algorithm outperforms the state-of-the-art and scales better as the size of the problem increases.

Cite

Text

Karnin et al. "Almost Optimal Exploration in Multi-Armed Bandits." International Conference on Machine Learning, 2013.

Markdown

[Karnin et al. "Almost Optimal Exploration in Multi-Armed Bandits." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/karnin2013icml-almost/)

BibTeX

@inproceedings{karnin2013icml-almost,
  title     = {{Almost Optimal Exploration in Multi-Armed Bandits}},
  author    = {Karnin, Zohar and Koren, Tomer and Somekh, Oren},
  booktitle = {International Conference on Machine Learning},
  year      = {2013},
  pages     = {1238-1246},
  volume    = {28},
  url       = {https://mlanthology.org/icml/2013/karnin2013icml-almost/}
}