When and Why Randomised Exploration Works (in Linear Bandits)

Abstract

We provide an approach for the analysis of randomised exploration algorithms like Thompson sampling that does not rely on forced optimism or posterior inflation. With this, we demonstrate that in the $d$-dimensional linear bandit setting, when the action space is smooth and strongly convex, randomised exploration algorithms enjoy an $n$-step regret bound of the order $O(d\sqrt{n} \log(n))$. Notably, this shows for the first time that there exist non-trivial linear bandit settings where Thompson sampling can achieve optimal dimension dependence in the regret.

Cite

Text

Abeille et al. "When and Why Randomised Exploration Works (in Linear Bandits)." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.

Markdown

[Abeille et al. "When and Why Randomised Exploration Works (in Linear Bandits)." Proceedings of The 36th International Conference on Algorithmic Learning Theory, 2025.](https://mlanthology.org/alt/2025/abeille2025alt-randomised/)

BibTeX

@inproceedings{abeille2025alt-randomised,
  title     = {{When and Why Randomised Exploration Works (in Linear Bandits)}},
  author    = {Abeille, Marc and Janz, David and Pike-Burke, Ciara},
  booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  year      = {2025},
  pages     = {4-22},
  volume    = {272},
  url       = {https://mlanthology.org/alt/2025/abeille2025alt-randomised/}
}