DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification

Abstract

Diffusion-based purification has demonstrated impressive robustness as an adversarial defense. However, concerns exist about whether this robustness arises from insufficient evaluation. Our research shows that EOT-based attacks face gradient dilemmas due to global gradient averaging, resulting in ineffective evaluations. Additionally, 1-evaluation underestimates resubmit risks in stochastic defenses. To address these issues, we propose an effective and efficient attack named DiffHammer. This method bypasses the gradient dilemma through selective attacks on vulnerable purifications, incorporating $N$-evaluation into loops and using gradient grafting for comprehensive and efficient evaluations. Our experiments validate that DiffHammer achieves effective results within 10-30 iterations, outperforming other methods. This calls into question the reliability of diffusion-based purification after mitigating the gradient dilemma and scrutinizing its resubmit risk.

Cite

Text

Wang et al. "DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification." Neural Information Processing Systems, 2024. doi:10.52202/079017-2842

Markdown

[Wang et al. "DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-diffhammer/) doi:10.52202/079017-2842

BibTeX

@inproceedings{wang2024neurips-diffhammer,
  title     = {{DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification}},
  author    = {Wang, Kaibo and Fu, Xiaowen and Han, Yuxuan and Xiang, Yang},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2842},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-diffhammer/}
}