Thresholding Bandit with Optimal Aggregate Regret

Abstract

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

Cite

Text

Tao et al. "Thresholding Bandit with Optimal Aggregate Regret." Neural Information Processing Systems, 2019.

Markdown

[Tao et al. "Thresholding Bandit with Optimal Aggregate Regret." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/tao2019neurips-thresholding/)

BibTeX

@inproceedings{tao2019neurips-thresholding,
  title     = {{Thresholding Bandit with Optimal Aggregate Regret}},
  author    = {Tao, Chao and Blanco, Saúl and Peng, Jian and Zhou, Yuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {11664-11673},
  url       = {https://mlanthology.org/neurips/2019/tao2019neurips-thresholding/}
}