Language Models Are Advanced Anonymizers

Abstract

Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts. With ever-increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. In this work, we take two steps to bridge this gap: First, we present a new setting for evaluating anonymization in the face of adversarial LLM inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. Then, within this setting, we develop a novel LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. We conduct a comprehensive experimental evaluation of adversarial anonymization across 13 LLMs on real-world and synthetic online texts, comparing it against multiple baselines and industry-grade anonymizers. Our evaluation shows that adversarial anonymization outperforms current commercial anonymizers both in terms of the resulting utility and privacy. We support our findings with a human study (n=50) highlighting a strong and consistent human preference for LLM-anonymized texts.

Cite

Text

Staab et al. "Language Models Are Advanced Anonymizers." International Conference on Learning Representations, 2025.

Markdown

[Staab et al. "Language Models Are Advanced Anonymizers." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/staab2025iclr-language/)

BibTeX

@inproceedings{staab2025iclr-language,
  title     = {{Language Models Are Advanced Anonymizers}},
  author    = {Staab, Robin and Vero, Mark and Balunovic, Mislav and Vechev, Martin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/staab2025iclr-language/}
}