Dynamic Bandits with Temporal Structure
Abstract
In this work, we study a dynamic multi-armed bandit (MAB) problem, where the expected reward of each arm evolves over time following an auto-regressive model. We present an algorithm whose per-round regret upper bound almost matches the regret lower bound, and numerically demonstrate its efficacy in adapting to the changing environment.
Cite
Text
Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/823Markdown
[Chen. "Dynamic Bandits with Temporal Structure." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/) doi:10.24963/IJCAI.2022/823BibTeX
@inproceedings{chen2022ijcai-dynamic,
title = {{Dynamic Bandits with Temporal Structure}},
author = {Chen, Qinyi},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {5841-5842},
doi = {10.24963/IJCAI.2022/823},
url = {https://mlanthology.org/ijcai/2022/chen2022ijcai-dynamic/}
}