Contextual Combinatorial Multi-Armed Bandits with Volatile Arms and Submodular Reward

NeurIPS 2018 pp. 3247-3256

/neurips/2018/chen2018neurips-contextual/

Abstract

In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By ``volatile arms'', we mean that the available arms to select from in each round may change; and by ``submodular rewards'', we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed algorithm is proved to achieve $O(cT^{\frac{2\alpha+D}{3\alpha + D}}\log(T))$ regret after a span of $T$ rounds. The performance of CC-MAB is evaluated by experiments conducted on a real-world crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.

PDF NeurIPS Semantic Scholar

Cite

Text

Chen et al. "Contextual Combinatorial Multi-Armed Bandits with Volatile Arms and Submodular Reward." Neural Information Processing Systems, 2018.

Markdown

[Chen et al. "Contextual Combinatorial Multi-Armed Bandits with Volatile Arms and Submodular Reward." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/chen2018neurips-contextual/)

BibTeX

@inproceedings{chen2018neurips-contextual,
  title     = {{Contextual Combinatorial Multi-Armed Bandits with Volatile Arms and Submodular Reward}},
  author    = {Chen, Lixing and Xu, Jie and Lu, Zhuo},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {3247-3256},
  url       = {https://mlanthology.org/neurips/2018/chen2018neurips-contextual/}
}