An Optimization-Based Algorithm for Non-Stationary Kernel Bandits Without Prior Knowledge

Abstract

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and exploitation. It adapts to non-stationarity by restarting when a change in the reward function is detected. Our algorithm enjoys a tighter dynamic regret bound than previous work on non- stationary kernel bandits. Moreover, when applied to the non-stationary linear bandits by us- ing a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature. We extend our algorithm to use a neural network for dynamically adapting the feature mapping to observed data. We prove a dynamic regret bound of the extension using the neural tangent kernel theory. We demonstrate empirically that our algorithm and the extension can adapt to varying degrees of non-stationarity.

Cite

Text

Hong et al. "An Optimization-Based Algorithm for Non-Stationary Kernel Bandits Without Prior Knowledge." Artificial Intelligence and Statistics, 2023.

Markdown

[Hong et al. "An Optimization-Based Algorithm for Non-Stationary Kernel Bandits Without Prior Knowledge." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/hong2023aistats-optimizationbased/)

BibTeX

@inproceedings{hong2023aistats-optimizationbased,
  title     = {{An Optimization-Based Algorithm for Non-Stationary Kernel Bandits Without Prior Knowledge}},
  author    = {Hong, Kihyuk and Li, Yuhang and Tewari, Ambuj},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {3048-3085},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/hong2023aistats-optimizationbased/}
}