Piecewise Stationary Bandits Under Risk Criteria

Abstract

Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii)We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.

Cite

Text

Bhatt et al. "Piecewise Stationary Bandits Under Risk Criteria." Artificial Intelligence and Statistics, 2023.

Markdown

[Bhatt et al. "Piecewise Stationary Bandits Under Risk Criteria." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/bhatt2023aistats-piecewise/)

BibTeX

@inproceedings{bhatt2023aistats-piecewise,
  title     = {{Piecewise Stationary Bandits Under Risk Criteria}},
  author    = {Bhatt, Sujay and Fang, Guanhua and Li, Ping},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {4313-4335},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/bhatt2023aistats-piecewise/}
}