Near-Optimal Thompson Sampling-Based Algorithms for Differentially Private Stochastic Bandits
Abstract
We address differentially private stochastic bandits. We present two (near)-optimal Thompson Sampling-based learning algorithms: DP-TS and Lazy-DP-TS. The core idea in achieving optimality is the principle of optimism in the face of uncertainty. We reshape the posterior distribution in an optimistic way as compared to the non-private Thompson Sampling. Our DP-TS achieves a $\sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\} )} \log \left(\frac{\log(T)}{\epsilon \cdot \Delta_j} \right) \right)$ regret bound, where $\mathcal{A}$ is the arm set, $\Delta_j$ is the sub-optimality gap of a sub-optimal arm $j$, and $\epsilon$ is the privacy parameter. Our Lazy-DP-TS gets rid of the extra $\log$ factor by using the idea of dropping observations. The regret of Lazy-DP-TS is $ \sum\limits_{j \in \mathcal{A}: \Delta_j > 0} O \left(\frac{\log(T)}{\min \left\{\epsilon, \Delta_j \right\}} \right)$, which matches the regret lower bound. Additionally, we conduct experiments to compare the empirical performance of our proposed algorithms with the existing optimal algorithms for differentially private stochastic bandits.
Cite
Text
Hu and Hegde. "Near-Optimal Thompson Sampling-Based Algorithms for Differentially Private Stochastic Bandits." Uncertainty in Artificial Intelligence, 2022.Markdown
[Hu and Hegde. "Near-Optimal Thompson Sampling-Based Algorithms for Differentially Private Stochastic Bandits." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/hu2022uai-nearoptimal/)BibTeX
@inproceedings{hu2022uai-nearoptimal,
title = {{Near-Optimal Thompson Sampling-Based Algorithms for Differentially Private Stochastic Bandits}},
author = {Hu, Bingshan and Hegde, Nidhi},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2022},
pages = {844-852},
volume = {180},
url = {https://mlanthology.org/uai/2022/hu2022uai-nearoptimal/}
}