Tracking Most Significant Arm Switches in Bandits
Abstract
In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.
Cite
Text
Suk and Kpotufe. "Tracking Most Significant Arm Switches in Bandits." Conference on Learning Theory, 2022.Markdown
[Suk and Kpotufe. "Tracking Most Significant Arm Switches in Bandits." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/suk2022colt-tracking/)BibTeX
@inproceedings{suk2022colt-tracking,
title = {{Tracking Most Significant Arm Switches in Bandits}},
author = {Suk, Joe and Kpotufe, Samory},
booktitle = {Conference on Learning Theory},
year = {2022},
pages = {2160-2182},
volume = {178},
url = {https://mlanthology.org/colt/2022/suk2022colt-tracking/}
}