Regret Bounds for Batched Bandits

Abstract

We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

Cite

Text

Esfandiari et al. "Regret Bounds for Batched Bandits." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I8.16901

Markdown

[Esfandiari et al. "Regret Bounds for Batched Bandits." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/esfandiari2021aaai-regret/) doi:10.1609/AAAI.V35I8.16901

BibTeX

@inproceedings{esfandiari2021aaai-regret,
  title     = {{Regret Bounds for Batched Bandits}},
  author    = {Esfandiari, Hossein and Karbasi, Amin and Mehrabian, Abbas and Mirrokni, Vahab S.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {7340-7348},
  doi       = {10.1609/AAAI.V35I8.16901},
  url       = {https://mlanthology.org/aaai/2021/esfandiari2021aaai-regret/}
}