Training-Free Safe Denoisers for Safe Use of Diffusion Models

Abstract

There is growing concern over the safety of powerful diffusion models, as they are often misused to produce inappropriate, not-safe-for-work content or generate copyrighted material or data of individuals who wish to be forgotten. Many existing methods tackle these issues by heavily relying on text-based negative prompts or retraining the model to eliminate certain features or samples. In this paper, we take a radically different approach, directly modifying the sampling trajectory by leveraging a negation set (e.g., unsafe images, copyrighted data, or private data) to avoid specific regions of data distribution, without needing to retrain or fine-tune the model. We formally derive the relationship between the expected denoised samples that are safe and those that are unsafe, leading to our *safe* denoiser, which ensures its final samples are away from the area to be negated. We achieve state-of-the-art safety performance in large-scale datasets such as the CoPro dataset while also enabling significantly more cost-effective sampling than existing methodologies.

Cite

Text

Kim et al. "Training-Free Safe Denoisers for Safe Use of Diffusion Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Kim et al. "Training-Free Safe Denoisers for Safe Use of Diffusion Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/kim2025neurips-trainingfree/)

BibTeX

@inproceedings{kim2025neurips-trainingfree,
  title     = {{Training-Free Safe Denoisers for Safe Use of Diffusion Models}},
  author    = {Kim, Mingyu and Kim, Dongjun and Yusuf, Amman and Ermon, Stefano and Park, Mijung},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/kim2025neurips-trainingfree/}
}