Batched Bandit Problems

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Cite

Text

Perchet et al. "Batched Bandit Problems." Annual Conference on Computational Learning Theory, 2015. doi:10.1214/15-AOS1381

Markdown

[Perchet et al. "Batched Bandit Problems." Annual Conference on Computational Learning Theory, 2015.](https://mlanthology.org/colt/2015/perchet2015colt-batched/) doi:10.1214/15-AOS1381

BibTeX

@inproceedings{perchet2015colt-batched,
  title     = {{Batched Bandit Problems}},
  author    = {Perchet, Vianney and Rigollet, Philippe and Chassang, Sylvain and Snowberg, Erik},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2015},
  pages     = {1456},
  doi       = {10.1214/15-AOS1381},
  url       = {https://mlanthology.org/colt/2015/perchet2015colt-batched/}
}