Differentially Private Multi-Armed Bandits in the Shuffle Model
Abstract
We give an $(\varepsilon,\delta)$-differentially private algorithm for the Multi-Armed Bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the action $a$, and $k$ is the total number of actions. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
Cite
Text
Tenenbaum et al. "Differentially Private Multi-Armed Bandits in the Shuffle Model." Neural Information Processing Systems, 2021.Markdown
[Tenenbaum et al. "Differentially Private Multi-Armed Bandits in the Shuffle Model." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/tenenbaum2021neurips-differentially/)BibTeX
@inproceedings{tenenbaum2021neurips-differentially,
title = {{Differentially Private Multi-Armed Bandits in the Shuffle Model}},
author = {Tenenbaum, Jay and Kaplan, Haim and Mansour, Yishay and Stemmer, Uri},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/tenenbaum2021neurips-differentially/}
}