Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

Abstract

"Machine unlearning" is a popular proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons, including privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of specific information from a generative-AI model's parameters, e.g., a particular individual's personal data or the inclusion of copyrighted content in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for ML researchers and policymakers to think rigorously about these challenges, identifying several mismatches between the goals of unlearning and feasible implementations. These mismatches explain why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact.

Cite

Text

Cooper et al. "Machine Unlearning Doesn't Do What You Think:  Lessons for Generative AI Policy and Research." Advances in Neural Information Processing Systems, 2025.

Markdown

[Cooper et al. "Machine Unlearning Doesn't Do What You Think:  Lessons for Generative AI Policy and Research." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/cooper2025neurips-machine/)

BibTeX

@inproceedings{cooper2025neurips-machine,
  title     = {{Machine Unlearning Doesn't Do What You Think:  Lessons for Generative AI Policy and Research}},
  author    = {Cooper, A. Feder and Choquette-Choo, Christopher A. and Bogen, Miranda and Klyman, Kevin and Jagielski, Matthew and Filippova, Katja and Liu, Ken and Chouldechova, Alexandra and Hayes, Jamie and Huang, Yangsibo and Triantafillou, Eleni and Kairouz, Peter and Mitchell, Nicole Elyse and Mireshghallah, Niloofar and Jacobs, Abigail Z. and Grimmelmann, James and Shmatikov, Vitaly and De Sa, Christopher and Shumailov, Ilia and Terzis, Andreas and Barocas, Solon and Vaughan, Jennifer Wortman and Boyd, Danah and Choi, Yejin and Koyejo, Sanmi and Delgado, Fernando and Liang, Percy and Ho, Daniel E. and Samuelson, Pamela and Brundage, Miles and Bau, David and Neel, Seth and Wallach, Hanna and Cyphert, Amy B. and Lemley, Mark and Papernot, Nicolas and Lee, Katherine},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/cooper2025neurips-machine/}
}