Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
Abstract
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrt{K} L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.
Cite
Text
Kveton et al. "Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits." International Conference on Artificial Intelligence and Statistics, 2015.Markdown
[Kveton et al. "Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits." International Conference on Artificial Intelligence and Statistics, 2015.](https://mlanthology.org/aistats/2015/kveton2015aistats-tight/)BibTeX
@inproceedings{kveton2015aistats-tight,
title = {{Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits}},
author = {Kveton, Branislav and Wen, Zheng and Ashkan, Azin and Szepesvári, Csaba},
booktitle = {International Conference on Artificial Intelligence and Statistics},
year = {2015},
url = {https://mlanthology.org/aistats/2015/kveton2015aistats-tight/}
}