Multi-Player Approaches for Dueling Bandits

Abstract

Fine-tuning large deep networks with preference-based human feedback has seen promising results. As user numbers grow and tasks shift to complex datasets like images or videos, distributed approaches become essential for efficiently gathering feedback. To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment. We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime which reduces regret. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of this setting.

Cite

Text

Raveh et al. "Multi-Player Approaches for Dueling Bandits." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Raveh et al. "Multi-Player Approaches for Dueling Bandits." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/raveh2025aistats-multiplayer/)

BibTeX

@inproceedings{raveh2025aistats-multiplayer,
  title     = {{Multi-Player Approaches for Dueling Bandits}},
  author    = {Raveh, Or and Honda, Junya and Sugiyama, Masashi},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {1540-1548},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/raveh2025aistats-multiplayer/}
}