Stochastic Multi-Armed Bandits in Constant Space

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret \[ \sum_i=1^K\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where $\Delta_i$ is the gap between the best arm and arm $i$ and $\Delta$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log 1/\Delta)$ factor of the optimum regret possible without space constraints.

Cite

Text

Liau et al. "Stochastic Multi-Armed Bandits in Constant Space." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Liau et al. "Stochastic Multi-Armed Bandits in Constant Space." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/liau2018aistats-stochastic/)

BibTeX

@inproceedings{liau2018aistats-stochastic,
  title     = {{Stochastic Multi-Armed Bandits in Constant Space}},
  author    = {Liau, David and Song, Zhao and Price, Eric and Yang, Ger},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {386-394},
  url       = {https://mlanthology.org/aistats/2018/liau2018aistats-stochastic/}
}