The Best Arm Evades: Near-Optimal Multi-Pass Streaming Lower Bounds for Pure Exploration in Multi-Armed Bandits

Abstract

We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(n/\Delta^2)$ requires $\Omega(\log{(1/\Delta)}/\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(1/\Delta))$ pass algorithm of Jin et al. [ICML’21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC’20].

Cite

Text

Assadi and Wang. "The Best Arm Evades: Near-Optimal Multi-Pass Streaming Lower Bounds for Pure Exploration in Multi-Armed Bandits." Conference on Learning Theory, 2024.

Markdown

[Assadi and Wang. "The Best Arm Evades: Near-Optimal Multi-Pass Streaming Lower Bounds for Pure Exploration in Multi-Armed Bandits." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/assadi2024colt-best/)

BibTeX

@inproceedings{assadi2024colt-best,
  title     = {{The Best Arm Evades: Near-Optimal Multi-Pass Streaming Lower Bounds for Pure Exploration in Multi-Armed Bandits}},
  author    = {Assadi, Sepehr and Wang, Chen},
  booktitle = {Conference on Learning Theory},
  year      = {2024},
  pages     = {311-358},
  volume    = {247},
  url       = {https://mlanthology.org/colt/2024/assadi2024colt-best/}
}