UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Abstract
The key components of machine learning are data samples for training, model for learning patterns, and loss function for optimizing accuracy. Analogously, unlearning can potentially be achieved through anti-data-samples (or anti-samples), unlearning method, and reversed loss function. While prior research has explored unlearning methods and reversed loss functions, the potential of anti-samples remains largely untapped. Although token based anti-samples have been previously introduced (Eldan & Russinovich (2023)), the use of reasoning-driven anti-samples—constructed with falsified answers and misleading rationales—remains unexplored. In this paper, we introduce UnStar: Unlearning with SelfTaught Anti-Sample Reasoning for large language models (LLMs). Our contributions are threefold: first, we propose a novel concept of reasoning-based anti-sample-induced unlearning; second, we generate anti-samples by leveraging misleading rationales, which help reverse learned associations and accelerate the unlearning process; and third, we enable fine-grained targeted unlearning, allowing for the selective removal of specific associations without impacting related knowledge—something not achievable by previous works. Results demonstrate that anti-samples offer an efficient, targeted unlearning strategy for LLMs, opening new avenues for privacy-preserving machine learning and model modification.
Cite
Text
Sinha et al. "UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs." Transactions on Machine Learning Research, 2025.Markdown
[Sinha et al. "UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/sinha2025tmlr-unstar/)BibTeX
@article{sinha2025tmlr-unstar,
title = {{UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs}},
author = {Sinha, Yash and Mandal, Murari and Kankanhalli, Mohan},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/sinha2025tmlr-unstar/}
}