Tracking Most Significant Arm Switches in Bandits

Abstract

In \emph{bandit with distribution shifts}, one aims to automatically adapt to unknown changes in reward distribution, and \emph{restart} exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually \emph{severe}. To resolve this question, we propose a new notion of \emph{significant shift}, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of \emph{total variation} $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.

Cite

Text

Suk and Kpotufe. "Tracking Most Significant Arm Switches in Bandits." Conference on Learning Theory, 2022.

Markdown

[Suk and Kpotufe. "Tracking Most Significant Arm Switches in Bandits." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/suk2022colt-tracking/)

BibTeX

@inproceedings{suk2022colt-tracking,
  title     = {{Tracking Most Significant Arm Switches in Bandits}},
  author    = {Suk, Joe and Kpotufe, Samory},
  booktitle = {Conference on Learning Theory},
  year      = {2022},
  pages     = {2160-2182},
  volume    = {178},
  url       = {https://mlanthology.org/colt/2022/suk2022colt-tracking/}
}