Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Abstract

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study \emph{sparse} regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works focused on the case where $S$ is known, or the action sets satisfy additional assumptions. In this work, we obtain the first sparse regret bounds that hold when $S$ is unknown and the action sets are adversarially generated. Our techniques combine online to confidence set conversions with a novel randomized model selection approach over a hierarchy of nested confidence sets. When $S$ is known, our analysis recovers state-of-the-art bounds for adversarial action sets. We also show that a variant of our approach, using Exp3 to dynamically select the confidence sets, can be used to improve the empirical performance of stochastic linear bandits while enjoying a regret bound with optimal dependence on the time horizon.

Cite

Text

Jin et al. "Sparsity-Agnostic Linear Bandits with Adaptive Adversaries." Neural Information Processing Systems, 2024. doi:10.52202/079017-1329

Markdown

[Jin et al. "Sparsity-Agnostic Linear Bandits with Adaptive Adversaries." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/jin2024neurips-sparsityagnostic/) doi:10.52202/079017-1329

BibTeX

@inproceedings{jin2024neurips-sparsityagnostic,
  title     = {{Sparsity-Agnostic Linear Bandits with Adaptive Adversaries}},
  author    = {Jin, Tianyuan and Jang, Kyoungseok and Cesa-Bianchi, Nicolò},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1329},
  url       = {https://mlanthology.org/neurips/2024/jin2024neurips-sparsityagnostic/}
}