Coordination Without Communication: Optimal Regret in Two Players Multi-Armed Bandits

Abstract

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Cite

Text

Bubeck and Budzinski. "Coordination Without Communication: Optimal Regret in Two Players Multi-Armed Bandits." Conference on Learning Theory, 2020.

Markdown

[Bubeck and Budzinski. "Coordination Without Communication: Optimal Regret in Two Players Multi-Armed Bandits." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/bubeck2020colt-coordination/)

BibTeX

@inproceedings{bubeck2020colt-coordination,
  title     = {{Coordination Without Communication: Optimal Regret in Two Players Multi-Armed Bandits}},
  author    = {Bubeck, Sébastien and Budzinski, Thomas},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {916-939},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/bubeck2020colt-coordination/}
}