Dynamic Bandits with Temporal Structure

IJCAI 2022 pp. 5841-5842

doi:10.24963/IJCAI.2022/823 /ijcai/2022/chen2022ijcai-dynamic/

Abstract

In this work, we study a dynamic multi-armed bandit (MAB) problem, where the expected reward of each arm evolves over time following an auto-regressive model. We present an algorithm whose per-round regret upper bound almost matches the regret lower bound, and numerically demonstrate its efficacy in adapting to the changing environment.

PDF IJCAI Semantic Scholar

Cite

Text

Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/823

Markdown

[Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/) doi:10.24963/IJCAI.2022/823

BibTeX

@inproceedings{chen2022ijcai-dynamic,
  title     = {{Dynamic Bandits with Temporal Structure}},
  author    = {Chen, Qinyi},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {5841-5842},
  doi       = {10.24963/IJCAI.2022/823},
  url       = {https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/}
}