Sparse Stochastic Bandits

Abstract

In the classical multi-armed bandit problem, $d$ arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with $d$ (or with $\sqrt{d}$ in the minimax sense). We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s<d$, have a \emphpositive expected reward. We are able to leverage this additional assumption to provide an algorithm whose regret scales with $s$ instead of $d$. Moreover, we prove that this algorithm is optimal by providing a matching lower bound – at least for a wide and pertinent range of parameters that we determine – and by evaluating its performance on simulated data.

Cite

Text

Kwon et al. "Sparse Stochastic Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.

Markdown

[Kwon et al. "Sparse Stochastic Bandits." Proceedings of the 2017 Conference on Learning Theory, 2017.](https://mlanthology.org/colt/2017/kwon2017colt-sparse/)

BibTeX

@inproceedings{kwon2017colt-sparse,
  title     = {{Sparse Stochastic Bandits}},
  author    = {Kwon, Joon and Perchet, Vianney and Vernade, Claire},
  booktitle = {Proceedings of the 2017 Conference on Learning Theory},
  year      = {2017},
  pages     = {1269-1270},
  volume    = {65},
  url       = {https://mlanthology.org/colt/2017/kwon2017colt-sparse/}
}