Contextual Combinatorial Cascading Bandits

Abstract

We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent’s reward is discounted depending on the position where the stopping criterion is met. We design a UCB-type algorithm, C^3-UCB, for this problem, prove an n-step regret bound \tildeO(\sqrt{n}) in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.

Cite

Text

Li et al. "Contextual Combinatorial Cascading Bandits." International Conference on Machine Learning, 2016.

Markdown

[Li et al. "Contextual Combinatorial Cascading Bandits." International Conference on Machine Learning, 2016.](https://mlanthology.org/icml/2016/li2016icml-contextual/)

BibTeX

@inproceedings{li2016icml-contextual,
  title     = {{Contextual Combinatorial Cascading Bandits}},
  author    = {Li, Shuai and Wang, Baoxiang and Zhang, Shengyu and Chen, Wei},
  booktitle = {International Conference on Machine Learning},
  year      = {2016},
  pages     = {1245-1253},
  volume    = {48},
  url       = {https://mlanthology.org/icml/2016/li2016icml-contextual/}
}