Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

Abstract

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show $\Omega(T^{2/3})$ cumulative regret in expectation for single-pass algorithms for arm-memory size of $(n-1)$, where $n$ is the number of arms. For best-arm identification, we provide an $(\varepsilon, \delta)$-PAC algorithm with arm memory size of $O(\log^*n)$ and $O(\frac{n}{\varepsilon^2}\cdot \log(\frac{1}{\delta}))$ optimal sample complexity.

Cite

Text

Maiti et al. "Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization." Neural Information Processing Systems, 2021.

Markdown

[Maiti et al. "Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/maiti2021neurips-multiarmed/)

BibTeX

@inproceedings{maiti2021neurips-multiarmed,
  title     = {{Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization}},
  author    = {Maiti, Arnab and Patil, Vishakha and Khan, Arindam},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/maiti2021neurips-multiarmed/}
}