Memorization in NLP Fine-Tuning Methods

Abstract

Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the ``pre-train and fine-tune'' paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.

Cite

Text

Mireshghallah et al. "Memorization in NLP Fine-Tuning Methods." ICML 2022 Workshops: Pre-Training, 2022.

Markdown

[Mireshghallah et al. "Memorization in NLP Fine-Tuning Methods." ICML 2022 Workshops: Pre-Training, 2022.](https://mlanthology.org/icmlw/2022/mireshghallah2022icmlw-memorization/)

BibTeX

@inproceedings{mireshghallah2022icmlw-memorization,
  title     = {{Memorization in NLP Fine-Tuning Methods}},
  author    = {Mireshghallah, Fatemehsadat and Uniyal, Archit and Wang, Tianhao and Evans, David and Berg-Kirkpatrick, Taylor},
  booktitle = {ICML 2022 Workshops: Pre-Training},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/mireshghallah2022icmlw-memorization/}
}