Scalable Thompson Sampling via Ensemble++
Abstract
Thompson Sampling is a principled uncertainty-driven method for active exploration, but its real-world adoption is impeded by the high computational overhead of posterior maintenance in large-scale or non-conjugate settings. Ensemble-based approaches offer partial remedies, but often require a large ensemble size. This paper proposes the Ensemble++, a scalable agent that sidesteps these limitations by a shared-factor ensemble update architecture and a random linear combination scheme. We theoretically justify that in linear bandits, Ensemble++ agent only needs an ensemble size of $\Theta(d \log T)$ to achieve regret guarantees comparable to exact Thompson Sampling. Further, to handle nonlinear rewards and complex environments. we introduce a neural extension that replaces fixed features with a learnable representation, preserving the same underlying objective via gradient-based updates. Empirical results confirm that Ensemble++ agent excel in both sample efficiency and computational scalability across linear and nonlinear environments, including GPT-based contextual bandits for automated content moderation -- a safety-critical foundation model online decision-making task.
Cite
Text
Li et al. "Scalable Thompson Sampling via Ensemble++." ICLR 2025 Workshops: FPI, 2025.Markdown
[Li et al. "Scalable Thompson Sampling via Ensemble++." ICLR 2025 Workshops: FPI, 2025.](https://mlanthology.org/iclrw/2025/li2025iclrw-scalable/)BibTeX
@inproceedings{li2025iclrw-scalable,
title = {{Scalable Thompson Sampling via Ensemble++}},
author = {Li, Yingru and Xu, Jiawei and Wang, Baoxiang and Luo, Zhi-Quan},
booktitle = {ICLR 2025 Workshops: FPI},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/li2025iclrw-scalable/}
}