Pure Exploration with Multiple Correct Answers

Abstract

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensure that the existing Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

Cite

Text

Degenne and Koolen. "Pure Exploration with Multiple Correct Answers." Neural Information Processing Systems, 2019.

Markdown

[Degenne and Koolen. "Pure Exploration with Multiple Correct Answers." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/degenne2019neurips-pure/)

BibTeX

@inproceedings{degenne2019neurips-pure,
  title     = {{Pure Exploration with Multiple Correct Answers}},
  author    = {Degenne, Rémy and Koolen, Wouter M.},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {14591-14600},
  url       = {https://mlanthology.org/neurips/2019/degenne2019neurips-pure/}
}