Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes
Abstract
We consider the variant of the stochastic multi-armed bandit problem where the stochastic reward distributions may change abruptly several times. In contrast to previous work, we are able to achieve (nearly) optimal mini-max regret bounds without knowing the number of changes. For this setting, we propose an algorithm called ADSWITCH and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. Our regret bound is the first optimal bound for an algorithm that is not tuned with respect to the number of changes.
Cite
Text
Auer et al. "Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes." Conference on Learning Theory, 2019.Markdown
[Auer et al. "Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes." Conference on Learning Theory, 2019.](https://mlanthology.org/colt/2019/auer2019colt-adaptively/)BibTeX
@inproceedings{auer2019colt-adaptively,
title = {{Adaptively Tracking the Best Bandit Arm with an Unknown Number of Distribution Changes}},
author = {Auer, Peter and Gajane, Pratik and Ortner, Ronald},
booktitle = {Conference on Learning Theory},
year = {2019},
pages = {138-158},
volume = {99},
url = {https://mlanthology.org/colt/2019/auer2019colt-adaptively/}
}