Generic Exploration and K-Armed Voting Bandits

Abstract

We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of dueling bandits for situations where the environment parameters reflect the idiosyncratic preferences of a mixed crowd.

Cite

Text

Urvoy et al. "Generic Exploration and K-Armed Voting Bandits." International Conference on Machine Learning, 2013.

Markdown

[Urvoy et al. "Generic Exploration and K-Armed Voting Bandits." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/urvoy2013icml-generic/)

BibTeX

@inproceedings{urvoy2013icml-generic,
  title     = {{Generic Exploration and K-Armed Voting Bandits}},
  author    = {Urvoy, Tanguy and Clerot, Fabrice and Féraud, Raphael and Naamane, Sami},
  booktitle = {International Conference on Machine Learning},
  year      = {2013},
  pages     = {91-99},
  volume    = {28},
  url       = {https://mlanthology.org/icml/2013/urvoy2013icml-generic/}
}