A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits

Abstract

This paper studies federated multi-armed bandit (MAB) problems in which multiple agents work together to solve a common MAB problem through a communication network. We focus on the heterogeneous setting in which no single agent can identify the globally best arm using only locally biased observations. In this setting, different agents may select the same arm at the same time step, but receive different rewards. We propose a novel algorithm called \textsc{FedFTRL} for this problem and, to our knowledge, it is the first to achieve near-optimal regret guarantees in both stochastic and adversarial environments. Notably, in the adversarial regime, our algorithm achieves $O(T^{\frac{1}{2}})$ regret, a significant improvement over the state-of-the-art regret of $O(T^{\frac{2}{3}})$ \citep{yi2023doubly}. We also provide empirical evaluations comparing our algorithm with baseline methods, demonstrating the effectiveness of our approach on both synthetic and real-world datasets.

Cite

Text

Hu et al. "A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits." International Conference on Learning Representations, 2026.

Markdown

[Hu et al. "A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-nearoptimal/)

BibTeX

@inproceedings{hu2026iclr-nearoptimal,
  title     = {{A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits}},
  author    = {Hu, Zicheng and Wang, Zihao and Chen, Cheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hu2026iclr-nearoptimal/}
}