Tight Regret Bounds for Single-Pass Streaming Multi-Armed Bandits

Abstract

Regret minimization in streaming multi-armed bandits (MABs) has been studied extensively, and recent work has shown that algorithms with $o(K)$ memory have to incur $\Omega(T^{2/3})$ regret, where $K$ and $T$ are the numbers of arms and trials. However, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. We first improve the regret lower bound to $\Omega(K^{1/3}T^{2/3})$ for algorithms with $o(K)$ memory. We then show that the $\log^{1/3}(T)$ factor is not necessary by designing algorithms with at most $O(\log^*(K))$-arm memory and achieve $O(K^{1/3}T^{2/3})$ expected regret based on streaming $\varepsilon$-best arm algorithms. We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.

Cite

Text

Wang. "Tight Regret Bounds for Single-Pass Streaming Multi-Armed Bandits." International Conference on Machine Learning, 2023.

Markdown

[Wang. "Tight Regret Bounds for Single-Pass Streaming Multi-Armed Bandits." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/wang2023icml-tight/)

BibTeX

@inproceedings{wang2023icml-tight,
  title     = {{Tight Regret Bounds for Single-Pass Streaming Multi-Armed Bandits}},
  author    = {Wang, Chen},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {35525-35547},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/wang2023icml-tight/}
}