Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models Without Side-Effect

Abstract

Diffusion models' unprecedented success with image generation can largely be attributed to their large-scale pretraining on massive datasets. Yet, the necessity of forgetting specific concepts for regulatory or copyright compliance poses a critical challenge. Existing approaches in concept forgetting, although reasonably successful in forgetting a given concept, frequently fail to preserve generation quality or demand extensive domain expertise for preservation. To alleviate such issues, we introduce Concept Siever, an end-to-end framework for targeted concept removal within pre-trained text-to-image diffusion models. The foundation of Concept Siever rests on \textit{two key innovations}: First, an automatic technique to create paired dataset of target concept and its negations by utilizing the diffusion model’s latent space. A key property of these pairs is that they differ only in the target concept, enabling forgetting with \textit{minimal side effects} and \textit{without requiring domain expertise}. Second, we present Concept Sieve, a localization method for identifying and isolating the model components most responsible to the target concept. By retraining only these localized components on our paired dataset for a target concept, Concept Siever accurately removes the concept with \textit{negligible side-effects, preserving neighboring and unrelated concepts}. Moreover, given the subjective nature of forgetting a concept like nudity, we propose Concept Sieve which provides a \texit{fine-grained control over the forgetting strength at inference time}, catering to diverse deployment needs without any need of finetuning. We report state-of-the-art performance on the I2P benchmark, surpassing previous domain-agnostic methods by over $33\%$ while showing superior structure preservation. We validate our results through extensive quantitative and qualitative evaluation along with a user study.

Cite

Text

Singh et al. "Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models Without Side-Effect." Transactions on Machine Learning Research, 2025.

Markdown

[Singh et al. "Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models Without Side-Effect." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/singh2025tmlr-concept/)

BibTeX

@article{singh2025tmlr-concept,
  title     = {{Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models Without Side-Effect}},
  author    = {Singh, Aakash Kumar and Dey, Priyam and Srivatsa, Sribhav and Radhakrishnan, Venkatesh Babu},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/singh2025tmlr-concept/}
}