Ensemble Sampling for Linear Bandits: Small Ensembles Suffice

Abstract

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $\smash{d \log T}$ incurs regret at most of the order $\smash{(d \log T)^{5/2} \sqrt{T}}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$---which defeats the purpose of ensemble sampling---while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Cite

Text

Janz et al. "Ensemble Sampling for Linear Bandits: Small Ensembles Suffice." Neural Information Processing Systems, 2024. doi:10.52202/079017-0747

Markdown

[Janz et al. "Ensemble Sampling for Linear Bandits: Small Ensembles Suffice." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/janz2024neurips-ensemble/) doi:10.52202/079017-0747

BibTeX

@inproceedings{janz2024neurips-ensemble,
  title     = {{Ensemble Sampling for Linear Bandits: Small Ensembles Suffice}},
  author    = {Janz, David and Litvak, Alexander E. and Szepesvári, Csaba},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0747},
  url       = {https://mlanthology.org/neurips/2024/janz2024neurips-ensemble/}
}