Order-Optimal Regret in Distributed Kernel Bandits Using Uniform Sampling with Shared Randomness

Abstract

We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret accumulated over time and agents. We develop the first algorithm that achieves the optimal regret order with a communication cost that is sublinear in both $N$ and $T$. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two approaches make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.

Cite

Text

Pavlovic et al. "Order-Optimal Regret in Distributed Kernel Bandits Using Uniform Sampling with Shared Randomness." NeurIPS 2024 Workshops: BDU, 2024.

Markdown

[Pavlovic et al. "Order-Optimal Regret in Distributed Kernel Bandits Using Uniform Sampling with Shared Randomness." NeurIPS 2024 Workshops: BDU, 2024.](https://mlanthology.org/neuripsw/2024/pavlovic2024neuripsw-orderoptimal/)

BibTeX

@inproceedings{pavlovic2024neuripsw-orderoptimal,
  title     = {{Order-Optimal Regret in Distributed Kernel Bandits Using Uniform Sampling with Shared Randomness}},
  author    = {Pavlovic, Nikola and Salgia, Sudeep and Zhao, Qing},
  booktitle = {NeurIPS 2024 Workshops: BDU},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/pavlovic2024neuripsw-orderoptimal/}
}