Understanding PII Leakage in Large Language Models: A Systematic Survey

Abstract

Large Language Models (LLMs) have demonstrated exceptional success across a variety of tasks, particularly in natural language processing, leading to their growing integration into numerous facets of daily life. However, this widespread deployment has raised substantial privacy concerns, especially regarding personally identifiable information (PII), which can be directly associated with specific individuals. The leakage of such information presents significant real-world privacy threats. In this paper, we conduct a systematic investigation into existing research on PII leakage in LLMs, encompassing commonly utilized PII datasets, evaluation metrics, and current studies on both PII leakage attacks and defensive strategies. Finally, we identify unresolved challenges in the current research landscape and suggest future research directions.

Cite

Text

Cheng et al. "Understanding PII Leakage in Large Language Models: A Systematic Survey." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1156

Markdown

[Cheng et al. "Understanding PII Leakage in Large Language Models: A Systematic Survey." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/cheng2025ijcai-understanding/) doi:10.24963/IJCAI.2025/1156

BibTeX

@inproceedings{cheng2025ijcai-understanding,
  title     = {{Understanding PII Leakage in Large Language Models: A Systematic Survey}},
  author    = {Cheng, Shuai and Li, Zhao and Meng, Shu and Ren, Mengxia and Xu, Haitao and Hao, Shuai and Yue, Chuan and Zhang, Fan},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10409-10417},
  doi       = {10.24963/IJCAI.2025/1156},
  url       = {https://mlanthology.org/ijcai/2025/cheng2025ijcai-understanding/}
}