Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms

Abstract

We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive a new margin-based regret bound in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield classical regret bounds.

Cite

Text

Foster and Krishnamurthy. "Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms." Neural Information Processing Systems, 2018.

Markdown

[Foster and Krishnamurthy. "Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/foster2018neurips-contextual/)

BibTeX

@inproceedings{foster2018neurips-contextual,
  title     = {{Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms}},
  author    = {Foster, Dylan J and Krishnamurthy, Akshay},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {2621-2632},
  url       = {https://mlanthology.org/neurips/2018/foster2018neurips-contextual/}
}