Connecting Thompson Sampling and UCB: Towards More Efficient Trade-Offs Between Privacy and Regret
Abstract
We address differentially private stochastic bandit problems by leveraging Thompson Sampling with Gaussian priors and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private algorithm that enables trading off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and achieves $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bounds, where $K$ is the number of arms, $ \Delta$ is the sub-optimality gap, $T$ is the learning horizon, and $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, DP-TS-UCB relies on anti-concentration bounds for the Gaussian distributions, linking the exploration mechanisms of Thompson Sampling and Upper Confidence Bound, which may be of independent research interest.
Cite
Text
Hu et al. "Connecting Thompson Sampling and UCB: Towards More Efficient Trade-Offs Between Privacy and Regret." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Hu et al. "Connecting Thompson Sampling and UCB: Towards More Efficient Trade-Offs Between Privacy and Regret." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/hu2025icml-connecting/)BibTeX
@inproceedings{hu2025icml-connecting,
title = {{Connecting Thompson Sampling and UCB: Towards More Efficient Trade-Offs Between Privacy and Regret}},
author = {Hu, Bingshan and Huang, Zhiming and Zhang, Tianyue H. and Lécuyer, Mathias and Hegde, Nidhi},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {24403-24423},
volume = {267},
url = {https://mlanthology.org/icml/2025/hu2025icml-connecting/}
}