Anytime Exploration for Multi-Armed Bandits Using Confidence Information

Abstract

We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-m arms at every time step. Anytime Explore-m is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.

Cite

Text

Jun and Nowak. "Anytime Exploration for Multi-Armed Bandits Using Confidence Information." International Conference on Machine Learning, 2016.

Markdown

[Jun and Nowak. "Anytime Exploration for Multi-Armed Bandits Using Confidence Information." International Conference on Machine Learning, 2016.](https://mlanthology.org/icml/2016/jun2016icml-anytime/)

BibTeX

@inproceedings{jun2016icml-anytime,
  title     = {{Anytime Exploration for Multi-Armed Bandits Using Confidence Information}},
  author    = {Jun, Kwang-Sung and Nowak, Robert},
  booktitle = {International Conference on Machine Learning},
  year      = {2016},
  pages     = {974-982},
  volume    = {48},
  url       = {https://mlanthology.org/icml/2016/jun2016icml-anytime/}
}