Removing Concepts from Text-to-Image Models with Only Negative Samples

Abstract

This work introduces Clipout, a method for removing a target concept in pre-trained text-to-image models. By randomly clipping units from the learned data embedding and using a contrastive objective, models are encouraged to differentiate these clipped embedding vectors. Our goal is to remove private, copyrighted, inaccurate, or harmful concepts from trained models without the need for retraining. This is achieved by considering only negative samples and generating them in a bootstrapping-like manner, requiring minimal prior knowledge. Additionally, theoretical analyses are provided to further understand our proposed Clipout. Extensive experiments on text-to-image show that Clipout is simple yet highly effective and efficient compared with previous state-of-the-art approaches.

Cite

Text

Liu and Mu. "Removing Concepts from Text-to-Image Models with Only Negative Samples." Advances in Neural Information Processing Systems, 2025.

Markdown

[Liu and Mu. "Removing Concepts from Text-to-Image Models with Only Negative Samples." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/liu2025neurips-removing/)

BibTeX

@inproceedings{liu2025neurips-removing,
  title     = {{Removing Concepts from Text-to-Image Models with Only Negative Samples}},
  author    = {Liu, Hanwen and Mu, Yadong},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/liu2025neurips-removing/}
}