Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-Trained Models

Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

NeurIPS 2024

doi:10.52202/079017-2652 /neurips/2024/wen2024neurips-privacy/

Abstract

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a re-evaluation of safety protocols in the use of open-source pre-trained models.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wen et al. "Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-Trained Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2652

Markdown

[Wen et al. "Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-Trained Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wen2024neurips-privacy/) doi:10.52202/079017-2652

BibTeX

@inproceedings{wen2024neurips-privacy,
  title     = {{Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-Trained Models}},
  author    = {Wen, Yuxin and Marchyok, Leo and Hong, Sanghyun and Geiping, Jonas and Goldstein, Tom and Carlini, Nicholas},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2652},
  url       = {https://mlanthology.org/neurips/2024/wen2024neurips-privacy/}
}