Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms
Abstract
We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the corrupted bandit problem is harder than the classical stochastic bandit problem with subGaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for corrupted bandits, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving corrupted bandits for different reward distributions and different levels of corruptions.
Cite
Text
Mathieu et al. "Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms." Transactions on Machine Learning Research, 2024.Markdown
[Mathieu et al. "Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/mathieu2024tmlr-bandits/)BibTeX
@article{mathieu2024tmlr-bandits,
title = {{Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms}},
author = {Mathieu, Timothée and Basu, Debabrota and Maillard, Odalric-Ambrym},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/mathieu2024tmlr-bandits/}
}