Adversarial Bandits Against Arbitrary Strategies

Abstract

We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKT\rho},S\sqrt{KT}\})$, where $\rho$ is a variance term for loss estimators.

Cite

Text

Kim and Yun. "Adversarial Bandits Against Arbitrary Strategies." Transactions on Machine Learning Research, 2025.

Markdown

[Kim and Yun. "Adversarial Bandits Against Arbitrary Strategies." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kim2025tmlr-adversarial/)

BibTeX

@article{kim2025tmlr-adversarial,
  title     = {{Adversarial Bandits Against Arbitrary Strategies}},
  author    = {Kim, Jung-hun and Yun, Se-Young},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/kim2025tmlr-adversarial/}
}