The Batch Complexity of Bandit Pure Exploration

Abstract

In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are interested in batched methods, which change their sampling behaviour only a few times, between batches of observations. We give an instance-dependent lower bound on the number of batches used by any sample efficient algorithm for any pure exploration task. We then give a general batched algorithm and prove upper bounds on its expected sample complexity and batch complexity. We illustrate both lower and upper bounds on best-arm identification and thresholding bandits.

Cite

Text

Tuynman and Degenne. "The Batch Complexity of Bandit Pure Exploration." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Tuynman and Degenne. "The Batch Complexity of Bandit Pure Exploration." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/tuynman2025icml-batch/)

BibTeX

@inproceedings{tuynman2025icml-batch,
  title     = {{The Batch Complexity of Bandit Pure Exploration}},
  author    = {Tuynman, Adrienne and Degenne, Rémy},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {60442-60468},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/tuynman2025icml-batch/}
}