(ε, U)-Adaptive Regret Minimization in Heavy-Tailed Bandits
Abstract
Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some $\epsilon \in (0,1]$. In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, \texttt{AdaR-UCB}, that leverages such an estimator. Finally, we show that \texttt{AdaR-UCB} is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
Cite
Text
Genalti et al. "(ε, U)-Adaptive Regret Minimization in Heavy-Tailed Bandits." Conference on Learning Theory, 2024.Markdown
[Genalti et al. "(ε, U)-Adaptive Regret Minimization in Heavy-Tailed Bandits." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/genalti2024colt-adaptive/)BibTeX
@inproceedings{genalti2024colt-adaptive,
title = {{(ε, U)-Adaptive Regret Minimization in Heavy-Tailed Bandits}},
author = {Genalti, Gianmarco and Marsigli, Lupo and Gatti, Nicola and Metelli, Alberto Maria},
booktitle = {Conference on Learning Theory},
year = {2024},
pages = {1882-1915},
volume = {247},
url = {https://mlanthology.org/colt/2024/genalti2024colt-adaptive/}
}