LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Abstract
We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap than in previous studies. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$ regret under the margin condition.
Cite
Text
Kato and Ito. "LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.Markdown
[Kato and Ito. "LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/kato2025aistats-lctsallisinf/)BibTeX
@inproceedings{kato2025aistats-lctsallisinf,
title = {{LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits}},
author = {Kato, Masahiro and Ito, Shinji},
booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
year = {2025},
pages = {3655-3663},
volume = {258},
url = {https://mlanthology.org/aistats/2025/kato2025aistats-lctsallisinf/}
}