Can Stochastic Weight Averaging Improve Generalization in Private Learning?

Abstract

We investigate stochastic weight averaging (SWA) for private learning in the context of generalization and model performance. Differentially private (DP) optimizers are known to suffer from reduced performance and high variance in comparison to non-private learning. However, the generalization properties of DP optimizers have not been studied much, in particular for large-scale machine learning models. SWA is variant of stochastic gradient descent (SGD) which averages the weights along the SGD trajectory. We consider a DP adaptation of SWA (DP-SWA) which incurs no additional privacy cost and has little computational overhead. For quadratic objective functions, we show that DP-SWA converges to the optimum at the same rate as non-private SGD, which implies convergence to zero for the excess risk. For non-convex objective functions, we observe throughout multiple experiments on standard benchmark datasets that averaging model weights improves generalization, model accuracy, and performance variance.

Cite

Text

Indri et al. "Can Stochastic Weight Averaging Improve Generalization in Private Learning?." ICLR 2023 Workshops: RTML, 2023.

Markdown

[Indri et al. "Can Stochastic Weight Averaging Improve Generalization in Private Learning?." ICLR 2023 Workshops: RTML, 2023.](https://mlanthology.org/iclrw/2023/indri2023iclrw-stochastic/)

BibTeX

@inproceedings{indri2023iclrw-stochastic,
  title     = {{Can Stochastic Weight Averaging Improve Generalization in Private Learning?}},
  author    = {Indri, Patrick and Drucks, Tamara and Gärtner, Thomas},
  booktitle = {ICLR 2023 Workshops: RTML},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/indri2023iclrw-stochastic/}
}