Best-of-Three-Worlds Analysis for Dueling Bandits with Borda Winner

Abstract

The dueling bandits (DB) problem addresses online learning from relative preferences, where the learner queries pairs of arms and receives binary win-loss feedback. Most existing work focuses on designing algorithms for specific stochastic or adversarial environments. Recently, a unified algorithm has been proposed that achieves convergence across all settings. However, this approach relies on the existence of a Condorcet winner, which is often not achievable, particularly when the preference matrix changes in the adversarial setting. Aiming for a more general Borda winner objective, there currently exists no unified framework that simultaneously achieves optimal regret across these environments. In this paper, we explore how the follow-the-regularized-leader (FTRL) algorithm can be employed to achieve this objective. We propose a hybrid negative entropy regularizer and demonstrate that it enables us to achieve $\tilde{O}(K^{1/3} T^{2/3})$ regret in the adversarial setting, ${O}({K \log^2 T}/{\Delta_{\min}^2})$ regret in the stochastic setting, and $O({K \log^2 T }/{\Delta_{\min}^2} + ({C^2 K \log^2 T }/{\Delta_{\min}^2})^{1/3})$ regret in the corrupted setting, where $K$ is the arm set size, $T$ is the horizon, $\Delta_{\min}$ is the minimum gap between the optimal and sub-optimal arms, and $C$ is the corruption level. These results align with the state-of-the-art in individual settings, while eliminating the need to assume a specific environment type. We also present experimental results demonstrating the advantages of our algorithm over baseline methods across different environments.

Cite

Text

Hu et al. "Best-of-Three-Worlds Analysis for Dueling Bandits with Borda Winner." International Conference on Learning Representations, 2026.

Markdown

[Hu et al. "Best-of-Three-Worlds Analysis for Dueling Bandits with Borda Winner." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-bestofthreeworlds/)

BibTeX

@inproceedings{hu2026iclr-bestofthreeworlds,
  title     = {{Best-of-Three-Worlds Analysis for Dueling Bandits with Borda Winner}},
  author    = {Hu, Zirui and Zhang, Tingyu and Kong, Fang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hu2026iclr-bestofthreeworlds/}
}