Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms
Abstract
We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive a new margin-based regret bound in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield classical regret bounds.
Cite
Text
Foster and Krishnamurthy. "Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms." Neural Information Processing Systems, 2018.Markdown
[Foster and Krishnamurthy. "Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/foster2018neurips-contextual/)BibTeX
@inproceedings{foster2018neurips-contextual,
title = {{Contextual Bandits with Surrogate Losses: Margin Bounds and Efficient Algorithms}},
author = {Foster, Dylan J and Krishnamurthy, Akshay},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {2621-2632},
url = {https://mlanthology.org/neurips/2018/foster2018neurips-contextual/}
}