Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most $m$ modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Réveillard and Combes. "Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms." Advances in Neural Information Processing Systems, 2025.

Markdown

[Réveillard and Combes. "Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/reveillard2025neurips-multimodal/)

BibTeX

@inproceedings{reveillard2025neurips-multimodal,
  title     = {{Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms}},
  author    = {Réveillard, William and Combes, Richard},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/reveillard2025neurips-multimodal/}
}