Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

Abstract

We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: \emph{abstention}. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to {\em abstain} from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiments validate our theoretical results, demonstrating that our approach not only advances theory but also has the potential to deliver significant practical benefits.

Cite

Text

Yang et al. "Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention." Transactions on Machine Learning Research, 2026.

Markdown

[Yang et al. "Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/yang2026tmlr-asymptotically/)

BibTeX

@article{yang2026tmlr-asymptotically,
  title     = {{Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention}},
  author    = {Yang, Junwen and Jin, Tianyuan and Tan, Vincent Y. F.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/yang2026tmlr-asymptotically/}
}