Scalable Thompson Sampling via Ensemble++

Abstract

Thompson Sampling is a principled uncertainty-driven method for active exploration, but its real-world adoption is impeded by the high computational overhead of posterior maintenance in large-scale or non-conjugate settings. Ensemble-based approaches offer partial remedies, but often require a large ensemble size. This paper proposes the Ensemble++, a scalable agent that sidesteps these limitations by a shared-factor ensemble update architecture and a random linear combination scheme. We theoretically justify that in linear bandits, Ensemble++ agent only needs an ensemble size of $\Theta(d \log T)$ to achieve regret guarantees comparable to exact Thompson Sampling. Further, to handle nonlinear rewards and complex environments. we introduce a neural extension that replaces fixed features with a learnable representation, preserving the same underlying objective via gradient-based updates. Empirical results confirm that Ensemble++ agent excel in both sample efficiency and computational scalability across linear and nonlinear environments, including GPT-based contextual bandits for automated content moderation -- a safety-critical foundation model online decision-making task.

Cite

Text

Li et al. "Scalable Thompson Sampling via Ensemble++." ICLR 2025 Workshops: FPI, 2025.

Markdown

[Li et al. "Scalable Thompson Sampling via Ensemble++." ICLR 2025 Workshops: FPI, 2025.](https://mlanthology.org/iclrw/2025/li2025iclrw-scalable/)

BibTeX

@inproceedings{li2025iclrw-scalable,
  title     = {{Scalable Thompson Sampling via Ensemble++}},
  author    = {Li, Yingru and Xu, Jiawei and Wang, Baoxiang and Luo, Zhi-Quan},
  booktitle = {ICLR 2025 Workshops: FPI},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/li2025iclrw-scalable/}
}