Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

Abstract

We study the problem of regret minimization for distributed bandits learning, in which $M$ agents work collaboratively to minimize their total regret under the coordination of a central server. Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data. For distributed multi-armed bandits, we propose a protocol with near-optimal regret and only $O(M\log(MK))$ communication cost, where $K$ is the number of arms. The communication cost is independent of the time horizon $T$, has only logarithmic dependence on the number of arms, and matches the lower bound except for a logarithmic factor. For distributed $d$-dimensional linear bandits, we propose a protocol that achieves near-optimal regret and has communication cost of order $O\left(\left(Md+d\log \log d\right)\log T\right)$, which has only logarithmic dependence on $T$.

Cite

Text

Wang et al. "Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication." International Conference on Learning Representations, 2020.

Markdown

[Wang et al. "Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/wang2020iclr-distributed/)

BibTeX

@inproceedings{wang2020iclr-distributed,
  title     = {{Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication}},
  author    = {Wang, Yuanhao and Hu, Jiachen and Chen, Xiaoyu and Wang, Liwei},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/wang2020iclr-distributed/}
}