Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only \otil(\sqrt{KT}) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.

Cite

Text

Agarwal et al. "Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits." International Conference on Machine Learning, 2014.

Markdown

[Agarwal et al. "Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits." International Conference on Machine Learning, 2014.](https://mlanthology.org/icml/2014/agarwal2014icml-taming/)

BibTeX

@inproceedings{agarwal2014icml-taming,
  title     = {{Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits}},
  author    = {Agarwal, Alekh and Hsu, Daniel and Kale, Satyen and Langford, John and Li, Lihong and Schapire, Robert},
  booktitle = {International Conference on Machine Learning},
  year      = {2014},
  pages     = {1638-1646},
  volume    = {32},
  url       = {https://mlanthology.org/icml/2014/agarwal2014icml-taming/}
}