Better Best of Both Worlds Bounds for Bandits with Switching Costs
Abstract
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer et al., 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/\Delta^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $\Delta$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., 2021, that achieved regret of $\mathcal{O}(T^{1/3}/\Delta)$. We accompany our results with a lower bound showing that, in general, $\tilde{\mathcal{\Omega}}(\min\{1/\Delta^2,T^{2/3}\})$ switching cost regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case switching cost regret.
Cite
Text
Amir et al. "Better Best of Both Worlds Bounds for Bandits with Switching Costs." Neural Information Processing Systems, 2022.Markdown
[Amir et al. "Better Best of Both Worlds Bounds for Bandits with Switching Costs." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/amir2022neurips-better/)BibTeX
@inproceedings{amir2022neurips-better,
title = {{Better Best of Both Worlds Bounds for Bandits with Switching Costs}},
author = {Amir, Idan and Azov, Guy and Koren, Tomer and Livni, Roi},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/amir2022neurips-better/}
}