An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?

Abstract

We pose the open problem of whether an efficient bandit algorithm exists achieving $O(\sqrt{T})$ regret for online multiclass prediction. The full-information setting admits efficient $O(\sqrt{T})$ algorithms, and inefficient bandit algorithms with this regret are known, but no computationally efficient bandit algorithm achieves it. We describe the problem precisely, including the multiclass hinge loss formulation, and survey existing results including the Banditron.

Cite

Text

Abernethy and Rakhlin. "An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?." Annual Conference on Computational Learning Theory, 2009.

Markdown

[Abernethy and Rakhlin. "An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/abernethy2009colt-efficient/)

BibTeX

@inproceedings{abernethy2009colt-efficient,
  title     = {{An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?}},
  author    = {Abernethy, Jacob D. and Rakhlin, Alexander},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2009},
  url       = {https://mlanthology.org/colt/2009/abernethy2009colt-efficient/}
}