Understanding PII Leakage in Large Language Models: A Systematic Survey
Abstract
Large Language Models (LLMs) have demonstrated exceptional success across a variety of tasks, particularly in natural language processing, leading to their growing integration into numerous facets of daily life. However, this widespread deployment has raised substantial privacy concerns, especially regarding personally identifiable information (PII), which can be directly associated with specific individuals. The leakage of such information presents significant real-world privacy threats. In this paper, we conduct a systematic investigation into existing research on PII leakage in LLMs, encompassing commonly utilized PII datasets, evaluation metrics, and current studies on both PII leakage attacks and defensive strategies. Finally, we identify unresolved challenges in the current research landscape and suggest future research directions.
Cite
Text
Cheng et al. "Understanding PII Leakage in Large Language Models: A Systematic Survey." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1156Markdown
[Cheng et al. "Understanding PII Leakage in Large Language Models: A Systematic Survey." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/cheng2025ijcai-understanding/) doi:10.24963/IJCAI.2025/1156BibTeX
@inproceedings{cheng2025ijcai-understanding,
title = {{Understanding PII Leakage in Large Language Models: A Systematic Survey}},
author = {Cheng, Shuai and Li, Zhao and Meng, Shu and Ren, Mengxia and Xu, Haitao and Hao, Shuai and Yue, Chuan and Zhang, Fan},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {10409-10417},
doi = {10.24963/IJCAI.2025/1156},
url = {https://mlanthology.org/ijcai/2025/cheng2025ijcai-understanding/}
}