Achieving Optimal Dynamic Regret for Non-Stationary Bandits Without Prior Information
Abstract
This joint extended abstract introduces and compares the results of (Auer et al., 2019) and (Chen et al., 2019), both of which resolve the problem of achieving optimal dynamic regret for non-stationary bandits without prior information on the non-stationarity. Specifically, Auer et al. (2019) resolve the problem for the traditional multi-armed bandits setting, while Chen et al. (2019) give a solution for the more general contextual bandits setting. Both works extend the key idea of (Auer et al., 2018) developed for a simpler two-armed setting.
Cite
Text
Auer et al. "Achieving Optimal Dynamic Regret for Non-Stationary Bandits Without Prior Information." Conference on Learning Theory, 2019.Markdown
[Auer et al. "Achieving Optimal Dynamic Regret for Non-Stationary Bandits Without Prior Information." Conference on Learning Theory, 2019.](https://mlanthology.org/colt/2019/auer2019colt-achieving/)BibTeX
@inproceedings{auer2019colt-achieving,
title = {{Achieving Optimal Dynamic Regret for Non-Stationary Bandits Without Prior Information}},
author = {Auer, Peter and Chen, Yifang and Gajane, Pratik and Lee, Chung-Wei and Luo, Haipeng and Ortner, Ronald and Wei, Chen-Yu},
booktitle = {Conference on Learning Theory},
year = {2019},
pages = {159-163},
volume = {99},
url = {https://mlanthology.org/colt/2019/auer2019colt-achieving/}
}