Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
Abstract
We propose novel algorithms with first- and second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.
Cite
Text
Ito et al. "Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits." Neural Information Processing Systems, 2020.Markdown
[Ito et al. "Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/ito2020neurips-tight/)BibTeX
@inproceedings{ito2020neurips-tight,
title = {{Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits}},
author = {Ito, Shinji and Hirahara, Shuichi and Soma, Tasuku and Yoshida, Yuichi},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/ito2020neurips-tight/}
}