Sparse Stochastic Bandits
Abstract
In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s<d$, have a \emphpositive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with $s$ instead of $d$. Moreover, we prove that this algorithm is optimal by providing a matching lower bound – at least for a wide and pertinent range of parameters that we determine – and by evaluating its performance on simulated data.
Cite
Text
Kwon et al. "Sparse Stochastic Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.Markdown
[Kwon et al. "Sparse Stochastic Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.](https://mlanthology.org/colt/2017/kwon2017colt-sparse/)BibTeX
@inproceedings{kwon2017colt-sparse,
title = {{Sparse Stochastic Bandits}},
author = {Kwon, Joon and Perchet, Vianney and Vernade, Claire},
booktitle = {Proceedings of the 2017 Conference on Learning Theory},
year = {2017},
pages = {1269-1270},
volume = {65},
url = {https://mlanthology.org/colt/2017/kwon2017colt-sparse/}
}