Dynamic Bandits with Temporal Structure

Abstract

In this work, we study a dynamic multi-armed bandit (MAB) problem, where the expected reward of each arm evolves over time following an auto-regressive model. We present an algorithm whose per-round regret upper bound almost matches the regret lower bound, and numerically demonstrate its efficacy in adapting to the changing environment.

Cite

Text

Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/823

Markdown

[Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/) doi:10.24963/IJCAI.2022/823

BibTeX

@inproceedings{chen2022ijcai-dynamic,
  title     = {{Dynamic Bandits with Temporal Structure}},
  author    = {Chen, Qinyi},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {5841-5842},
  doi       = {10.24963/IJCAI.2022/823},
  url       = {https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/}
}