Battle of Bandits

Abstract

We introduce Battling-Bandits – an online learning framework where given a set of n arms, the learner needs to select a subset of k ≥ 2 arms in each round and subsequently observes a stochastic feedback indicating the winner of the round. This framework generalizes the standard Dueling-Bandit framework which applies to several practical scenarios such as medical treatment preferences, recommender systems, search engine optimization etc., where it is easier and more effective to collect feedback for multiple options simultaneously. We develop a novel class of pairwise subset choice model, for modelling the subset-wise winner feedback and propose three algorithms - Battling-Doubler, Battling-MultiSBM and Battling-Duel: While the first two are designed for a special class of linear-link based choice models, the third one applies to a much general class of pairwise-subset choice models with Condorcet winner. We also analyzed their regret guarantees and show the optimality of Battling-Duel proving a matching regret lower bound of Ω(n log T ), which (perhaps surprisingly) shows that the flexibility of playing size-k subsets does not really help to gather information faster than the corresponding dueling case (k = 2), at least for our current feedback model. The efficacy of our algorithms are demonstrated through extensive experimental evaluations on a variety of synthetic and real world datasets.

Cite

Text

Saha and Gopalan. "Battle of Bandits." Conference on Uncertainty in Artificial Intelligence, 2018.

Markdown

[Saha and Gopalan. "Battle of Bandits." Conference on Uncertainty in Artificial Intelligence, 2018.](https://mlanthology.org/uai/2018/saha2018uai-battle/)

BibTeX

@inproceedings{saha2018uai-battle,
  title     = {{Battle of Bandits}},
  author    = {Saha, Aadirupa and Gopalan, Aditya},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2018},
  pages     = {805-814},
  url       = {https://mlanthology.org/uai/2018/saha2018uai-battle/}
}