Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Abstract
Per-example gradient clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models. The choice of clipping threshold $R$, however, is vital for achieving high accuracy under DP. We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune $R$ for any DP optimizers, including DP-SGD, DP-Adam, DP-LAMB and many others.The automatic variants are as private and computationally efficient as existing DP optimizers, but require no DP-specific hyperparameters and thus make DP training as amenable as the standard non-private training. We give a rigorous convergence analysis of automatic DP-SGD in the non-convex setting, showing that it can enjoy an asymptotic convergence rate that matches the standard SGD, under a symmetric gradient noise assumption of the per-sample gradients (commonly used in the non-DP literature). We demonstrate on various language and vision tasks that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases.
Cite
Text
Bu et al. "Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger." Neural Information Processing Systems, 2023.Markdown
[Bu et al. "Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/bu2023neurips-automatic/)BibTeX
@inproceedings{bu2023neurips-automatic,
title = {{Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger}},
author = {Bu, Zhiqi and Wang, Yu-Xiang and Zha, Sheng and Karypis, George},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/bu2023neurips-automatic/}
}