Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Abstract

Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

Cite

Text

Allen-Zhu et al. "Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits." International Conference on Machine Learning, 2018.

Markdown

[Allen-Zhu et al. "Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/allenzhu2018icml-make/)

BibTeX

@inproceedings{allenzhu2018icml-make,
  title     = {{Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits}},
  author    = {Allen-Zhu, Zeyuan and Bubeck, Sebastien and Li, Yuanzhi},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {186-194},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/allenzhu2018icml-make/}
}