High-Probability Regret Bounds for Bandit Online Linear Optimization

Abstract

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most $O^*(\sqrt{T})$ against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension ($ n^{3/2}$) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.

Cite

Text

Bartlett et al. "High-Probability Regret Bounds for Bandit Online Linear Optimization." Annual Conference on Computational Learning Theory, 2008.

Markdown

[Bartlett et al. "High-Probability Regret Bounds for Bandit Online Linear Optimization." Annual Conference on Computational Learning Theory, 2008.](https://mlanthology.org/colt/2008/bartlett2008colt-high/)

BibTeX

@inproceedings{bartlett2008colt-high,
  title     = {{High-Probability Regret Bounds for Bandit Online Linear Optimization}},
  author    = {Bartlett, Peter L. and Dani, Varsha and Hayes, Thomas P. and Kakade, Sham M. and Rakhlin, Alexander and Tewari, Ambuj},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2008},
  pages     = {335-342},
  url       = {https://mlanthology.org/colt/2008/bartlett2008colt-high/}
}