Spurious Privacy Leakage in Neural Networks

Abstract

Neural networks trained on real-world data often exhibit biases while simultaneously being vulnerable to privacy attacks aimed at extracting sensitive information. Despite extensive research on each problem individually, their intersection remains poorly understood. In this work, we investigate the privacy impact of spurious correlation bias. We introduce _spurious privacy leakage_, a phenomenon in which spurious groups are significantly more vulnerable to privacy attacks than non-spurious groups. We observe that privacy disparity between groups increases in tasks with simpler objectives (e.g. fewer classes) due to spurious features. Counterintuitively, we demonstrate that spurious robust methods, designed to reduce spurious bias, fail to mitigate privacy disparity. Our analysis reveals that this occurs because robust methods can reduce reliance on spurious features for prediction, but do not prevent their memorization during training. Finally, we systematically compare the privacy of different model architectures trained with spurious data, demonstrating that, contrary to previous work, architectural choice can affect privacy evaluation.

Cite

Text

Zhang et al. "Spurious Privacy Leakage in Neural Networks." Transactions on Machine Learning Research, 2025.

Markdown

[Zhang et al. "Spurious Privacy Leakage in Neural Networks." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/zhang2025tmlr-spurious/)

BibTeX

@article{zhang2025tmlr-spurious,
  title     = {{Spurious Privacy Leakage in Neural Networks}},
  author    = {Zhang, Chenxiang and Pang, Jun and Mauw, Sjouke},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/zhang2025tmlr-spurious/}
}