Learning to Optimize Under Non-Stationarity
Abstract
We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best (compared to existing literature) dynamic regret.
Cite
Text
Cheung et al. "Learning to Optimize Under Non-Stationarity." Artificial Intelligence and Statistics, 2019.Markdown
[Cheung et al. "Learning to Optimize Under Non-Stationarity." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/cheung2019aistats-learning/)BibTeX
@inproceedings{cheung2019aistats-learning,
title = {{Learning to Optimize Under Non-Stationarity}},
author = {Cheung, Wang Chi and Simchi-Levi, David and Zhu, Ruihao},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {1079-1087},
volume = {89},
url = {https://mlanthology.org/aistats/2019/cheung2019aistats-learning/}
}