Salient Conditional Diffusion for Backdoors

Abstract

We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a diffusion model (DDPM) to degrade an image with noise and then recover it. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. Sancdifi is a black-box defense, requiring no access to the trojan network parameters.

Cite

Text

May et al. "Salient Conditional Diffusion for Backdoors." ICLR 2023 Workshops: BANDS, 2023.

Markdown

[May et al. "Salient Conditional Diffusion for Backdoors." ICLR 2023 Workshops: BANDS, 2023.](https://mlanthology.org/iclrw/2023/may2023iclrw-salient/)

BibTeX

@inproceedings{may2023iclrw-salient,
  title     = {{Salient Conditional Diffusion for Backdoors}},
  author    = {May, Brandon B and Tatro, Norman Joseph and Kumar, Piyush and Shnidman, Nathan},
  booktitle = {ICLR 2023 Workshops: BANDS},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/may2023iclrw-salient/}
}