Bounded Regret in Stochastic Multi-Armed Bandits

Abstract

We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not possible if one only knows $\mu^{(\star)}$

Cite

Text

Bubeck et al. "Bounded Regret in Stochastic Multi-Armed Bandits." Annual Conference on Computational Learning Theory, 2013.

Markdown

[Bubeck et al. "Bounded Regret in Stochastic Multi-Armed Bandits." Annual Conference on Computational Learning Theory, 2013.](https://mlanthology.org/colt/2013/bubeck2013colt-bounded/)

BibTeX

@inproceedings{bubeck2013colt-bounded,
  title     = {{Bounded Regret in Stochastic Multi-Armed Bandits}},
  author    = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2013},
  pages     = {122-134},
  url       = {https://mlanthology.org/colt/2013/bubeck2013colt-bounded/}
}