Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits
Abstract
We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms, including Thompson sampling under most settings.
Cite
Text
Tsuchiya et al. "Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits." Artificial Intelligence and Statistics, 2023.Markdown
[Tsuchiya et al. "Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/tsuchiya2023aistats-further/)BibTeX
@inproceedings{tsuchiya2023aistats-further,
title = {{Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits}},
author = {Tsuchiya, Taira and Ito, Shinji and Honda, Junya},
booktitle = {Artificial Intelligence and Statistics},
year = {2023},
pages = {8117-8144},
volume = {206},
url = {https://mlanthology.org/aistats/2023/tsuchiya2023aistats-further/}
}