Nonstochastic Contextual Combinatorial Bandits

Abstract

We study a contextual version of online combinatorial optimisation with full and semi-bandit feedback. In this sequential decision-making problem, an online learner has to select an action from a combinatorial decision space after seeing a vector-valued context in each round. As a result of its action, the learner incurs a loss that is a bilinear function of the context vector and the vector representation of the chosen action. We consider two natural versions of the problem: semi-bandit where the losses are revealed for each component appearing in the learner’s combinatorial action, and full-bandit where only the total loss is observed. We design computationally efficient algorithms based on a new loss estimator that takes advantage of the special structure of the problem, and show regret bounds order $\sqrt{T}$ with respect to the time horizon. The bounds demonstrate polynomial scaling with the relevant problem parameters which is shown to be nearly optimal. The theoretical results are complemented by a set of experiments on simulated data.

Cite

Text

Zierahn et al. "Nonstochastic Contextual Combinatorial Bandits." Artificial Intelligence and Statistics, 2023.

Markdown

[Zierahn et al. "Nonstochastic Contextual Combinatorial Bandits." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/zierahn2023aistats-nonstochastic/)

BibTeX

@inproceedings{zierahn2023aistats-nonstochastic,
  title     = {{Nonstochastic Contextual Combinatorial Bandits}},
  author    = {Zierahn, Lukas and Hoeven, Dirk and Cesa-Bianchi, Nicolò and Neu, Gergely},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {8771-8813},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/zierahn2023aistats-nonstochastic/}
}