On Interruptible Pure Exploration in Multi-Armed Bandits

Abstract

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate performance improvement over time. Our experimental evaluation demonstrates that the corresponding instances of DB favorably compete both with the currently popular strategies UCB1 and Epsilon-Greedy, as well as with the conservative uniform sampling.

Cite

Text

Shleyfman et al. "On Interruptible Pure Exploration in Multi-Armed Bandits." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9688

Markdown

[Shleyfman et al. "On Interruptible Pure Exploration in Multi-Armed Bandits." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/shleyfman2015aaai-interruptible/) doi:10.1609/AAAI.V29I1.9688

BibTeX

@inproceedings{shleyfman2015aaai-interruptible,
  title     = {{On Interruptible Pure Exploration in Multi-Armed Bandits}},
  author    = {Shleyfman, Alexander and Komenda, Antonín and Domshlak, Carmel},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {3592-3598},
  doi       = {10.1609/AAAI.V29I1.9688},
  url       = {https://mlanthology.org/aaai/2015/shleyfman2015aaai-interruptible/}
}