Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
Abstract
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: \Delta_i>0} \frac{\log T}{\Delta_i})$ for suboptimality gap $\Delta_i$ of arm $i$ and time horizon $T$. On the other hand, it is shown in Audibert et al. (2007) that the regret bound can be tightened to $O(\sum_{i: \Delta_i>0} (\frac{\sigma_i^2}{\Delta_i} + 1) \log T )$ using the loss variance $\sigma_i^2$ of each arm $i$ in the stochastic environments. In this paper, we propose an algorithm based on the follow-the-regularized-leader method, which employs adaptive learning rates that depend on the empirical prediction error of the loss. This is the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. In addition, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. Table 1 summarizes the achievable bounds in comparison with UCB-V Audibert et al. (2007), Tsallis-INF (Zimmert and Seldin, 2021) and LB-INF (Ito, 2021).
Cite
Text
Ito et al. "Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds." Conference on Learning Theory, 2022.Markdown
[Ito et al. "Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/ito2022colt-adversarially/)BibTeX
@inproceedings{ito2022colt-adversarially,
title = {{Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds}},
author = {Ito, Shinji and Tsuchiya, Taira and Honda, Junya},
booktitle = {Conference on Learning Theory},
year = {2022},
pages = {1421-1422},
volume = {178},
url = {https://mlanthology.org/colt/2022/ito2022colt-adversarially/}
}