Textual Unlearning Gives a False Sense of Unlearning
Abstract
Language Models (LMs) are prone to ”memorizing” training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ”forget” specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.
Cite
Text
Du et al. "Textual Unlearning Gives a False Sense of Unlearning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Du et al. "Textual Unlearning Gives a False Sense of Unlearning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/du2025icml-textual/)BibTeX
@inproceedings{du2025icml-textual,
title = {{Textual Unlearning Gives a False Sense of Unlearning}},
author = {Du, Jiacheng and Wang, Zhibo and Zhang, Jie and Pang, Xiaoyi and Hu, Jiahui and Ren, Kui},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {14579-14597},
volume = {267},
url = {https://mlanthology.org/icml/2025/du2025icml-textual/}
}