On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice
Abstract
This paper examines two extensions of multi-armed bandit problems: multi-armed bandits with expert advice and contextual linear bandits. For the former problem, multi-armed bandits with expert advice, the previously known best upper and lower bounds have been $O(\sqrt{KT \log \frac{N}{K} })$ and $\Omega( \sqrt{KT \frac{ \log N }{\log K }} )$, respectively. Here, $K$, $N$, and $T$ represent the numbers of arms, experts, and rounds, respectively. We provide a lower bound of $\Omega( \sqrt{KT \log \frac{N}{K}} )$ for the setup in which the player chooses an expert before observing the advices in each round. For the latter problem, contextual linear bandits, we provide an algorithm that achieves $O ( \sqrt{d T \log ( K \min\{ 1, \frac{S}{d} \} )} )$ together with a matching lower bound, where $d$ and $S$ represent the dimensionality of feature vectors and the size of the context space, respectively.
Cite
Text
Ito. "On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice." Neural Information Processing Systems, 2024. doi:10.52202/079017-1974Markdown
[Ito. "On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ito2024neurips-minimax/) doi:10.52202/079017-1974BibTeX
@inproceedings{ito2024neurips-minimax,
title = {{On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice}},
author = {Ito, Shinji},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1974},
url = {https://mlanthology.org/neurips/2024/ito2024neurips-minimax/}
}