Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time

Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, Jing Dong

ICML 2025 pp. 21932-21947

/icml/2025/han2025icml-adaptive/

Abstract

Text-to-image (T2I) diffusion models have raised concerns about generating inappropriate content, such as "nudity". Despite efforts to erase undesirable concepts through unlearning techniques, these unlearned models remain vulnerable to adversarial inputs that can potentially regenerate such content. To safeguard unlearned models, we propose a novel inference-time defense strategy that mitigates the impact of adversarial inputs. Specifically, we first reformulate the challenge of ensuring robustness in unlearned diffusion models as a robust regression problem. Building upon the naive median smoothing for regression robustness, which employs isotropic Gaussian noise, we develop a generalized median smoothing framework that incorporates anisotropic noise. Based on this framework, we introduce a token-wise Adaptive Median Smoothing method that dynamically adjusts noise intensity according to each token’s relevance to target concepts. Furthermore, to improve inference efficiency, we explore implementations of this adaptive method at the text-encoding stage. Extensive experiments demonstrate that our approach enhances adversarial robustness while preserving model utility and inference efficiency, outperforming baseline defense techniques.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Han et al. "Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Han et al. "Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/han2025icml-adaptive/)

BibTeX

@inproceedings{han2025icml-adaptive,
  title     = {{Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time}},
  author    = {Han, Xiaoxuan and Yang, Songlin and Wang, Wei and Li, Yang and Dong, Jing},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {21932-21947},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/han2025icml-adaptive/}
}