Beat the Mean Bandit

Yue, Yisong; Joachims, Thorsten

Beat the Mean Bandit

ICML 2011 pp. 241-248

/icml/2011/yue2011icml-beat/

Abstract

The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we extend the Dueling Bandits Problem to a relaxed setting where preference magnitudes can violate transitivity. We present the first algorithm for this more general Dueling Bandits Problem and provide theoretical guarantees in both the online and the PAC settings. Furthermore, we show that the new algorithm has stronger guarantees than existing results even in the original Dueling Bandits Problem, which we validate empirically.

PDF Semantic Scholar

Cite

Text

Yue and Joachims. "Beat the Mean Bandit." International Conference on Machine Learning, 2011.

Markdown

[Yue and Joachims. "Beat the Mean Bandit." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/yue2011icml-beat/)

BibTeX

@inproceedings{yue2011icml-beat,
  title     = {{Beat the Mean Bandit}},
  author    = {Yue, Yisong and Joachims, Thorsten},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {241-248},
  url       = {https://mlanthology.org/icml/2011/yue2011icml-beat/}
}