Combinatorial Bandits
Abstract
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set $S \subseteq \{0,1\}^d$ and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the "bandit" setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order $\sqrt{nd \ln |S|}$ where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist.
Cite
Text
Cesa-Bianchi and Lugosi. "Combinatorial Bandits." Annual Conference on Computational Learning Theory, 2009.Markdown
[Cesa-Bianchi and Lugosi. "Combinatorial Bandits." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/cesabianchi2009colt-combinatorial/)BibTeX
@inproceedings{cesabianchi2009colt-combinatorial,
title = {{Combinatorial Bandits}},
author = {Cesa-Bianchi, Nicolò and Lugosi, Gábor},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2009},
url = {https://mlanthology.org/colt/2009/cesabianchi2009colt-combinatorial/}
}