Batched Bandit Problems
Abstract
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Cite
Text
Perchet et al. "Batched Bandit Problems." Annual Conference on Computational Learning Theory, 2015. doi:10.1214/15-AOS1381Markdown
[Perchet et al. "Batched Bandit Problems." Annual Conference on Computational Learning Theory, 2015.](https://mlanthology.org/colt/2015/perchet2015colt-batched/) doi:10.1214/15-AOS1381BibTeX
@inproceedings{perchet2015colt-batched,
title = {{Batched Bandit Problems}},
author = {Perchet, Vianney and Rigollet, Philippe and Chassang, Sylvain and Snowberg, Erik},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2015},
pages = {1456},
doi = {10.1214/15-AOS1381},
url = {https://mlanthology.org/colt/2015/perchet2015colt-batched/}
}