Indexed Minimum Empirical Divergence for Unimodal Bandits

Abstract

We consider a stochastic multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure. The unimodal structure is of practical relevance for several applications. We introduce IMED-UB, an algorithm that exploits provably optimally the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura (2015). Owing to our proof technique, we are able to provide a concise finite-time analysis of the IMED-UB algorithm, that is simple and yet yields asymptotic optimality. We finally provide numerical experiments showing that IMED-UB competes favorably with the recently introduced state-of-the-art algorithms.

Cite

Text

Saber et al. "Indexed Minimum Empirical Divergence for Unimodal Bandits." Neural Information Processing Systems, 2021.

Markdown

[Saber et al. "Indexed Minimum Empirical Divergence for Unimodal Bandits." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/saber2021neurips-indexed/)

BibTeX

@inproceedings{saber2021neurips-indexed,
  title     = {{Indexed Minimum Empirical Divergence for Unimodal Bandits}},
  author    = {Saber, Hassan and Ménard, Pierre and Maillard, Odalric-Ambrym},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/saber2021neurips-indexed/}
}