Best-of-Both-Worlds Linear Contextual Bandits
Abstract
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the {\tt RealLinExp3} by \citet{Neu2020} and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left\{\frac{(\log(T))^3}{\Delta_{*}} + \sqrt{\frac{C(\log(T))^3}{\Delta_{*}}},\ \ \sqrt{T}(\log(T))^2\right\}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $\Delta_{*} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))^3}{\Delta_{*}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))^2\right)$ in an adversarial environment. We refer to our strategy as the {\tt Best-of-Both-Worlds (BoBW) RealFTRL}, due to its theoretical guarantees in both stochastic and adversarial regimes.
Cite
Text
Kato and Ito. "Best-of-Both-Worlds Linear Contextual Bandits." Transactions on Machine Learning Research, 2024.Markdown
[Kato and Ito. "Best-of-Both-Worlds Linear Contextual Bandits." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/kato2024tmlr-bestofbothworlds/)BibTeX
@article{kato2024tmlr-bestofbothworlds,
title = {{Best-of-Both-Worlds Linear Contextual Bandits}},
author = {Kato, Masahiro and Ito, Shinji},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/kato2024tmlr-bestofbothworlds/}
}