Characterizing the Training Dynamics of Private Fine-Tuning with Langevin Diffusion

Abstract

We show that **d**ifferentially **p**rivate **f**ull **f**ine-**t**uning (DP-FFT) can distort pre-trained backbone features based on both theoretical and empirical results. We identify the cause of the distortion as the misalignment between the pre-trained backbone and the randomly initialized linear head. We prove that a sequential fine-tuning strategy can mitigate the feature distortion: first-linear-probing-then-fine-tuning (DP-LP-FFT). A new approximation scheme allows us to derive approximate upper and lower bounds on the training loss of DP-LP and DP-FFT, in a simple but canonical setting of 2-layer neural networks with ReLU activation. Experiments on real-world datasets and architectures are consistent with our theoretical insights. We also derive new upper bounds for 2-layer linear networks without the approximation. Moreover, our theory suggests a trade-off of privacy budget allocation in multi-phase fine-tuning methods like DP-LP-FFT.

Cite

Text

Ke et al. "Characterizing the Training Dynamics of Private Fine-Tuning with Langevin Diffusion." Transactions on Machine Learning Research, 2025.

Markdown

[Ke et al. "Characterizing the Training Dynamics of Private Fine-Tuning with Langevin Diffusion." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/ke2025tmlr-characterizing/)

BibTeX

@article{ke2025tmlr-characterizing,
  title     = {{Characterizing the Training Dynamics of Private Fine-Tuning with Langevin Diffusion}},
  author    = {Ke, Shuqi and Hou, Charlie and Oh, Sewoong and Fanti, Giulia},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/ke2025tmlr-characterizing/}
}