Thresholding Bandit with Optimal Aggregate Regret
Abstract
We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.
Cite
Text
Tao et al. "Thresholding Bandit with Optimal Aggregate Regret." Neural Information Processing Systems, 2019.Markdown
[Tao et al. "Thresholding Bandit with Optimal Aggregate Regret." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/tao2019neurips-thresholding/)BibTeX
@inproceedings{tao2019neurips-thresholding,
title = {{Thresholding Bandit with Optimal Aggregate Regret}},
author = {Tao, Chao and Blanco, Saúl and Peng, Jian and Zhou, Yuan},
booktitle = {Neural Information Processing Systems},
year = {2019},
pages = {11664-11673},
url = {https://mlanthology.org/neurips/2019/tao2019neurips-thresholding/}
}