Beat the Mean Bandit

Abstract

The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we extend the Dueling Bandits Problem to a relaxed setting where preference magnitudes can violate transitivity. We present the first algorithm for this more general Dueling Bandits Problem and provide theoretical guarantees in both the online and the PAC settings. Furthermore, we show that the new algorithm has stronger guarantees than existing results even in the original Dueling Bandits Problem, which we validate empirically.

Cite

Text

Yue and Joachims. "Beat the Mean Bandit." International Conference on Machine Learning, 2011.

Markdown

[Yue and Joachims. "Beat the Mean Bandit." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/yue2011icml-beat/)

BibTeX

@inproceedings{yue2011icml-beat,
  title     = {{Beat the Mean Bandit}},
  author    = {Yue, Yisong and Joachims, Thorsten},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {241-248},
  url       = {https://mlanthology.org/icml/2011/yue2011icml-beat/}
}