A Simple Approach for Non-Stationary Linear Bandits

Abstract

This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Previous studies have adopted sophisticated mechanisms, such as sliding window or weighted penalty to achieve near-optimal dynamic regret. In this paper, we demonstrate that a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration, and restart it periodically to handle the drift of unknown parameters. Let $T$ be the time horizon, $d$ be the dimension, and $P_T$ be the path-length that measures the fluctuation of the evolving unknown parameter, our approach enjoys an $\tilde{O}(d^{2/3}(1+P_T)^{1/3}T^{2/3})$ dynamic regret, which is nearly optimal, matching the $\Omega(d^{2/3}(1+P_T)^{1/3}T^{2/3})$ minimax lower bound up to logarithmic factors. Empirical studies also validate the efficacy of our approach.

Cite

Text

Zhao et al. "A Simple Approach for Non-Stationary Linear Bandits." Artificial Intelligence and Statistics, 2020.

Markdown

[Zhao et al. "A Simple Approach for Non-Stationary Linear Bandits." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/zhao2020aistats-simple/)

BibTeX

@inproceedings{zhao2020aistats-simple,
  title     = {{A Simple Approach for Non-Stationary Linear Bandits}},
  author    = {Zhao, Peng and Zhang, Lijun and Jiang, Yuan and Zhou, Zhi-Hua},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {746-755},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/zhao2020aistats-simple/}
}