Adversarial Contextual Bandits Go Kernelized
Abstract
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\tilde O(KT^{\frac{1}{2}\pa{1+\frac{1}{c}}})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\tOO(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.
Cite
Text
Neu et al. "Adversarial Contextual Bandits Go Kernelized." Proceedings of The 35th International Conference on Algorithmic Learning Theory, 2024.Markdown
[Neu et al. "Adversarial Contextual Bandits Go Kernelized." Proceedings of The 35th International Conference on Algorithmic Learning Theory, 2024.](https://mlanthology.org/alt/2024/neu2024alt-adversarial/)BibTeX
@inproceedings{neu2024alt-adversarial,
title = {{Adversarial Contextual Bandits Go Kernelized}},
author = {Neu, Gergely and Olkhovskaya, Julia and Vakili, Sattar},
booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory},
year = {2024},
pages = {907-929},
volume = {237},
url = {https://mlanthology.org/alt/2024/neu2024alt-adversarial/}
}