A Non-Asymptotic Approach to Best-Arm Identification for Gaussian Bandits

Abstract

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

Cite

Text

Barrier et al. "A Non-Asymptotic Approach to Best-Arm Identification for Gaussian Bandits." Artificial Intelligence and Statistics, 2022.

Markdown

[Barrier et al. "A Non-Asymptotic Approach to Best-Arm Identification for Gaussian Bandits." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/barrier2022aistats-nonasymptotic/)

BibTeX

@inproceedings{barrier2022aistats-nonasymptotic,
  title     = {{A Non-Asymptotic Approach to Best-Arm Identification for Gaussian Bandits}},
  author    = {Barrier, Antoine and Garivier, Aurélien and Kocák, Tomáš},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2022},
  pages     = {10078-10109},
  volume    = {151},
  url       = {https://mlanthology.org/aistats/2022/barrier2022aistats-nonasymptotic/}
}