Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate with Collision Information, Sublinear Without

Abstract

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication and no shared randomness at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, assuming only two players, and under the feedback model where collisions are announced to the colliding players. We also prove the first sublinear regret guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

Cite

Text

Bubeck et al. "Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate with Collision Information, Sublinear Without." Conference on Learning Theory, 2020.

Markdown

[Bubeck et al. "Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate with Collision Information, Sublinear Without." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/bubeck2020colt-nonstochastic/)

BibTeX

@inproceedings{bubeck2020colt-nonstochastic,
  title     = {{Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate with Collision Information, Sublinear Without}},
  author    = {Bubeck, Sébastien and Li, Yuanzhi and Peres, Yuval and Sellke, Mark},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {961-987},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/bubeck2020colt-nonstochastic/}
}