Data-Dependent Bounds with $t$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits Using Stability-Penalty Matching

Abstract

Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and $T$-optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order $O(\sqrt{T})$ in the adversarial regime and $O(\ln{T})$ in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the learning rates in the follow-the-regularized-leader (FTRL) framework, which further indicates that the combination of SPM and FTRL is a promising approach for proving new adaptive bounds in online learning problems.

Cite

Text

Nguyen et al. "Data-Dependent Bounds with $t$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits Using Stability-Penalty Matching." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.

Markdown

[Nguyen et al. "Data-Dependent Bounds with $t$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits Using Stability-Penalty Matching." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/nguyen2025colt-datadependent/)

BibTeX

@inproceedings{nguyen2025colt-datadependent,
  title     = {{Data-Dependent Bounds with $t$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits Using Stability-Penalty Matching}},
  author    = {Nguyen, Quan and Ito, Shinji and Komiyama, Junpei and Nishant, Mehta},
  booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
  year      = {2025},
  pages     = {4386-4451},
  volume    = {291},
  url       = {https://mlanthology.org/colt/2025/nguyen2025colt-datadependent/}
}