On Private and Robust Bandits
Abstract
We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We consider both the finite $k$-th raw moment and the finite $k$-th central moment settings for heavy-tailed rewards distributions with $k\ge 2$. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level, and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism. For the above two different heavy-tailed settings, we give corresponding schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regrets. Moreover, our two proposed truncation-based or histogram-based \texttt{PRM} schemes achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results and show the effectiveness of our algorithms with experimental studies.
Cite
Text
Wu et al. "On Private and Robust Bandits." Neural Information Processing Systems, 2023.Markdown
[Wu et al. "On Private and Robust Bandits." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/wu2023neurips-private-a/)BibTeX
@inproceedings{wu2023neurips-private-a,
title = {{On Private and Robust Bandits}},
author = {Wu, Yulian and Zhou, Xingyu and Tao, Youming and Wang, Di},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/wu2023neurips-private-a/}
}