Better Best of Both Worlds Bounds for Bandits with Switching Costs

Abstract

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer et al., 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/\Delta^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $\Delta$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., 2021, that achieved regret of $\mathcal{O}(T^{1/3}/\Delta)$. We accompany our results with a lower bound showing that, in general, $\tilde{\mathcal{\Omega}}(\min\{1/\Delta^2,T^{2/3}\})$ switching cost regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case switching cost regret.

Cite

Text

Amir et al. "Better Best of Both Worlds Bounds for Bandits with Switching Costs." Neural Information Processing Systems, 2022.

Markdown

[Amir et al. "Better Best of Both Worlds Bounds for Bandits with Switching Costs." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/amir2022neurips-better/)

BibTeX

@inproceedings{amir2022neurips-better,
  title     = {{Better Best of Both Worlds Bounds for Bandits with Switching Costs}},
  author    = {Amir, Idan and Azov, Guy and Koren, Tomer and Livni, Roi},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/amir2022neurips-better/}
}