High-Probability Regret Bounds for Bandit Online Linear Optimization
Abstract
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most $O^*(\sqrt{T})$ against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension ($ n^{3/2}$) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining highprobability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings.
Cite
Text
Bartlett et al. "High-Probability Regret Bounds for Bandit Online Linear Optimization." Annual Conference on Computational Learning Theory, 2008.Markdown
[Bartlett et al. "High-Probability Regret Bounds for Bandit Online Linear Optimization." Annual Conference on Computational Learning Theory, 2008.](https://mlanthology.org/colt/2008/bartlett2008colt-high/)BibTeX
@inproceedings{bartlett2008colt-high,
title = {{High-Probability Regret Bounds for Bandit Online Linear Optimization}},
author = {Bartlett, Peter L. and Dani, Varsha and Hayes, Thomas P. and Kakade, Sham M. and Rakhlin, Alexander and Tewari, Ambuj},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2008},
pages = {335-342},
url = {https://mlanthology.org/colt/2008/bartlett2008colt-high/}
}