Routine Bandits: Minimizing Regret on Recurring Problems
Abstract
We study a variant of the multi-armed bandit problem in which a learner faces every day one of $\mathcal {B}$ B many bandit instances, and call it a routine bandit. More specifically, at each period , the same bandit $b^h_\star $ b ⋆ h is considered during $T > 1$ T > 1 consecutive time steps, but the identity $b^h_\star $ b ⋆ h is unknown to the learner. We assume all rewards distribution are Gaussian standard. Such a situation typically occurs in recommender systems when a learner may repeatedly serve the same user whose identity is unknown due to privacy issues. By combining bandit-identification tests with a KLUCB type strategy, we introduce the KLUCB for Routine Bandits ( KLUCB-RB ) algorithm. While independently running KLUCB algorithm at each period leads to a cumulative expected regret of $\varOmega (H \log T)$ Ω ( H log T ) after H many periods when $T \rightarrow \infty $ T → ∞ , KLUCB-RB benefits from previous periods by aggregating observations from similar identified bandits, which yields a non-trivial scaling of $\varOmega (\log T)$ Ω ( log T ) . This is achieved without knowing which bandit instance is being faced by KLUCB-RB on this period, nor knowing a priori the number of possible bandit instances. We provide numerical illustration that confirm the benefit of KLUCB-RB while using less information about the problem compared with existing strategies for similar problems.
Cite
Text
Saber et al. "Routine Bandits: Minimizing Regret on Recurring Problems." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_1Markdown
[Saber et al. "Routine Bandits: Minimizing Regret on Recurring Problems." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/saber2021ecmlpkdd-routine/) doi:10.1007/978-3-030-86486-6_1BibTeX
@inproceedings{saber2021ecmlpkdd-routine,
title = {{Routine Bandits: Minimizing Regret on Recurring Problems}},
author = {Saber, Hassan and Saci, Léo and Maillard, Odalric-Ambrym and Durand, Audrey},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {3-18},
doi = {10.1007/978-3-030-86486-6_1},
url = {https://mlanthology.org/ecmlpkdd/2021/saber2021ecmlpkdd-routine/}
}