Routine Bandits: Minimizing Regret on Recurring Problems

Saber, Hassan; Saci, Léo; Maillard, Odalric-Ambrym; Durand, Audrey

doi:10.1007/978-3-030-86486-6_1

Routine Bandits: Minimizing Regret on Recurring Problems

Hassan Saber, Léo Saci, Odalric-Ambrym Maillard, Audrey Durand

ECML-PKDD 2021 pp. 3-18

doi:10.1007/978-3-030-86486-6_1 /ecmlpkdd/2021/saber2021ecmlpkdd-routine/

Abstract

We study a variant of the multi-armed bandit problem in which a learner faces every day one of $\mathcal {B}$ B many bandit instances, and call it a routine bandit. More specifically, at each period , the same bandit $b^h_\star $ b ⋆ h is considered during $T > 1$ T > 1 consecutive time steps, but the identity $b^h_\star $ b ⋆ h is unknown to the learner. We assume all rewards distribution are Gaussian standard. Such a situation typically occurs in recommender systems when a learner may repeatedly serve the same user whose identity is unknown due to privacy issues. By combining bandit-identification tests with a KLUCB type strategy, we introduce the KLUCB for Routine Bandits ( KLUCB-RB ) algorithm. While independently running KLUCB algorithm at each period leads to a cumulative expected regret of $\varOmega (H \log T)$ Ω ( H log T ) after H many periods when $T \rightarrow \infty $ T → ∞ , KLUCB-RB benefits from previous periods by aggregating observations from similar identified bandits, which yields a non-trivial scaling of $\varOmega (\log T)$ Ω ( log T ) . This is achieved without knowing which bandit instance is being faced by KLUCB-RB on this period, nor knowing a priori the number of possible bandit instances. We provide numerical illustration that confirm the benefit of KLUCB-RB while using less information about the problem compared with existing strategies for similar problems.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Saber et al. "Routine Bandits: Minimizing Regret on Recurring Problems." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_1

Markdown

[Saber et al. "Routine Bandits: Minimizing Regret on Recurring Problems." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/saber2021ecmlpkdd-routine/) doi:10.1007/978-3-030-86486-6_1

BibTeX

@inproceedings{saber2021ecmlpkdd-routine,
  title     = {{Routine Bandits: Minimizing Regret on Recurring Problems}},
  author    = {Saber, Hassan and Saci, Léo and Maillard, Odalric-Ambrym and Durand, Audrey},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {3-18},
  doi       = {10.1007/978-3-030-86486-6_1},
  url       = {https://mlanthology.org/ecmlpkdd/2021/saber2021ecmlpkdd-routine/}
}