Optimal Algorithms for Multiplayer Multi-Armed Bandits

Abstract

The paper addresses various Multiplayer Multi-Armed Bandit (MMAB) problems, where M decision-makers, or players, collaborate to maximize their cumulative reward. We first investigate the MMAB problem where players selecting the same arms experience a collision (and are aware of it) and do not collect any reward. For this problem, we present DPE1 (Decentralized Parsimonious Exploration), a decentralized algorithm that achieves the same asymptotic regret as that obtained by an optimal centralized algorithm. DPE1 is simpler than the state-of-the-art algorithm SIC-MMAB Boursier and Perchet (2019), and yet offers better performance guarantees. We then study the MMAB problem without collision, where players may select the same arm. Players sit on vertices of a graph, and in each round, they are able to send a message to their neighbours in the graph. We present DPE2, a simple and asymptotically optimal algorithm that outperforms the state-of-the-art algorithm DD- UCB Martinez-Rubio et al. (2019). Besides, under DPE2, the expected number of bits transmitted by the players in the graph is finite.

Cite

Text

Wang et al. "Optimal Algorithms for Multiplayer Multi-Armed Bandits." Artificial Intelligence and Statistics, 2020.

Markdown

[Wang et al. "Optimal Algorithms for Multiplayer Multi-Armed Bandits." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/wang2020aistats-optimal/)

BibTeX

@inproceedings{wang2020aistats-optimal,
  title     = {{Optimal Algorithms for Multiplayer Multi-Armed Bandits}},
  author    = {Wang, Po-An and Proutiere, Alexandre and Ariu, Kaito and Jedra, Yassir and Russo, Alessio},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {4120-4129},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/wang2020aistats-optimal/}
}