Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the M-Set Semi-Bandit Problems
Abstract
We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be $\mathcal{O}(\sqrt{nmd})$ for time horizon $n$, is achieved by the well-known Follow-the-Regularized-Leader (FTRL) policy. However, this requires to explicitly compute the arm-selection probabilities via optimizing problems at each time step and sample according to them. This problem can be avoided by the Follow-the-Perturbed-Leader (FTPL) policy, which simply pulls the $m$ arms that rank among the $m$ smallest (estimated) loss with random perturbation. In this paper, we show that FTPL with a Fr\'echet perturbation also enjoys the near optimal regret bound $\mathcal{O}(\sqrt{nm}(\sqrt{d\log(d)}+m^{5/6}))$ in the adversarial setting and approaches best-of-both-world regret bounds, i.e., achieves a logarithmic regret for the stochastic setting. Moreover, our lower bounds show that the extra factors are unavoidable with our approach; any improvement would require a fundamentally different and more challenging method.
Cite
Text
Zhan et al. "Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the M-Set Semi-Bandit Problems." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhan et al. "Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the M-Set Semi-Bandit Problems." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhan2025neurips-followtheperturbedleader/)BibTeX
@inproceedings{zhan2025neurips-followtheperturbedleader,
title = {{Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the M-Set Semi-Bandit Problems}},
author = {Zhan, Jingxin and Xin, Yuchen and Sun, Chenjie and Zhang, Zhihua},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhan2025neurips-followtheperturbedleader/}
}