Removing Backdoor Behaviors with Unlabeled Data

Abstract

The increasing computational demand of Deep Neural Networks (DNNs) motivates companies and organizations to outsource the training process. However, outsourcing training process makes DNNs easy to be backdoor attacked. It is necessary to defend against such attacks, i.e., to design a training strategy or postprocess a trained suspicious model so that backdoor behavior of a model is mitigated while normal prediction power on clean inputs is not affected. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, these samples are usually unavailable in the real world, causing existing methods not applicable. In this paper, we argue that, to mitigate backdoor, (1) labels of data may not be necessary (2) in-distribution data may not be needed. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively remove backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we compare our framework with six backdoor defense methods using labeled data against six state-of-the-art backdoor attacks. The experiments show that our framework can achieve comparable results, even only with out-of-distribution data.

Cite

Text

Pang et al. "Removing Backdoor Behaviors with Unlabeled Data." ICLR 2023 Workshops: BANDS, 2023.

Markdown

[Pang et al. "Removing Backdoor Behaviors with Unlabeled Data." ICLR 2023 Workshops: BANDS, 2023.](https://mlanthology.org/iclrw/2023/pang2023iclrw-removing/)

BibTeX

@inproceedings{pang2023iclrw-removing,
  title     = {{Removing Backdoor Behaviors with Unlabeled Data}},
  author    = {Pang, Lu and Sun, Tao and Ling, Haibin and Chen, Chao},
  booktitle = {ICLR 2023 Workshops: BANDS},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/pang2023iclrw-removing/}
}