JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models
Abstract
Diffusion models are widely used in real-world applications, but ensuring their safety remains a major challenge. Despite many efforts to enhance the security of diffusion models, jailbreak and adversarial attacks can still bypass these defenses, generating harmful content. However, the lack of standardized evaluation makes it difficult to assess the robustness of diffusion model system. To address this, we introduce JailbreakDiffBench, a comprehensive benchmark for systematically evaluating the safety of diffusion models against various attacks and under different defenses. Our benchmark includes a high-quality, human-annotated prompt and image dataset covering diverse attack scenarios. It consists of two key components: (1) an evaluation protocol to measure the effectiveness of moderation mechanisms and (2) an attack assessment module to benchmark adversarial jailbreak strategies. Through extensive experiments, we analyze existing filters and reveal critical weaknesses in current safety measures. JailbreakDiffBench is designed to support both text-to-image and text-to-video models, ensuring extensibility and reproducibility.
Cite
Text
Jin et al. "JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models." International Conference on Computer Vision, 2025.Markdown
[Jin et al. "JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/jin2025iccv-jailbreakdiffbench/)BibTeX
@inproceedings{jin2025iccv-jailbreakdiffbench,
title = {{JailbreakDiffBench: A Comprehensive Benchmark for Jailbreaking Diffusion Models}},
author = {Jin, Xiaolong and Weng, Zixuan and Guo, Hanxi and Yin, Chenlong and Cheng, Siyuan and Shen, Guangyu and Zhang, Xiangyu},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {16461-16471},
url = {https://mlanthology.org/iccv/2025/jin2025iccv-jailbreakdiffbench/}
}