Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Mathieu, Timothée; Basu, Debabrota; Maillard, Odalric-Ambrym

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard

TMLR 2024

/tmlr/2024/mathieu2024tmlr-bandits/

Abstract

We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the corrupted bandit problem is harder than the classical stochastic bandit problem with subGaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for corrupted bandits, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving corrupted bandits for different reward distributions and different levels of corruptions.

PDF TMLR Semantic Scholar

Cite

Text

Mathieu et al. "Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms." Transactions on Machine Learning Research, 2024.

Markdown

[Mathieu et al. "Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/mathieu2024tmlr-bandits/)

BibTeX

@article{mathieu2024tmlr-bandits,
  title     = {{Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms}},
  author    = {Mathieu, Timothée and Basu, Debabrota and Maillard, Odalric-Ambrym},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/mathieu2024tmlr-bandits/}
}