Fine-Tuning Large Language Models with User-Level Differential Privacy

Abstract

We investigate practical and scalable algorithms for training large language models (LLMs) with user-level differential privacy (ULDP). We study variants of DP-SGD that use example-level sampling (ELS) and user-level sampling (ULS). We derive a novel ULDP accountant that computes provably tight privacy guarantees for ELS, and use it to show that while ELS outperforms ULS in specific settings, ULS performs better when users have diverse collections of examples. We validate our findings in realistic LLM fine-tuning tasks under fixed compute budgets. Our results show that ULS is significantly better when (1) strong privacy guarantees are required, or (2) the compute budget is large. Our focus on LLM-compatible training algorithms allows us to scale to models with hundreds of millions of parameters and datasets with hundreds of thousands of users.

Cite

Text

Charles et al. "Fine-Tuning Large Language Models with User-Level Differential Privacy." ICML 2024 Workshops: TF2M, 2024.

Markdown

[Charles et al. "Fine-Tuning Large Language Models with User-Level Differential Privacy." ICML 2024 Workshops: TF2M, 2024.](https://mlanthology.org/icmlw/2024/charles2024icmlw-finetuning/)

BibTeX

@inproceedings{charles2024icmlw-finetuning,
  title     = {{Fine-Tuning Large Language Models with User-Level Differential Privacy}},
  author    = {Charles, Zachary and Ganesh, Arun and McKenna, Ryan and McMahan, Hugh Brendan and Mitchell, Nicole Elyse and Pillutla, Krishna and Rush, J Keith},
  booktitle = {ICML 2024 Workshops: TF2M},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/charles2024icmlw-finetuning/}
}