Effective Backdoor Mitigation Depends on the Pre-Training Objective

Verma, Sahil; Bhatt, Gantavya; Singhal, Soumye; Das, Arnav Mohanty; Shah, Chirag; Dickerson, John P; Bilmes, Jeff

Effective Backdoor Mitigation Depends on the Pre-Training Objective

Sahil Verma, Gantavya Bhatt, Soumye Singhal, Arnav Mohanty Das, Chirag Shah, John P Dickerson, Jeff Bilmes

NeurIPSW 2023

/neuripsw/2023/verma2023neuripsw-effective/

Abstract

Despite the remarkable capabilities of current machine learning (ML) models, they are still susceptible to adversarial and backdoor attacks. Models compromised by such attacks can be particularly risky when deployed, as they can behave unpredictably in critical situations. Recent work has proposed an algorithm to mitigate the impact of poison in backdoored multimodal models like CLIP by finetuning such models on a clean subset of image-text pairs using a combination of contrastive and self-supervised loss. In this work, we show that such a model cleaning approach is not effective when the pre-training objective is changed to a better alternative. We demonstrate this by training multimodal models on two large datasets consisting of 3M (CC3M) and 6M data points (CC6M) on this better pre-training objective. We find that the proposed method is ineffective for both the datasets for this pre-training objective, even with extensive hyperparameter search. Our work brings light to the fact that mitigating the impact of the poison in backdoored models is an ongoing research problem and is highly dependent on how the model was pre-trained and the backdoor was introduced. The full version of the paper can be found at https://arxiv.org/abs/2311.14948.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Verma et al. "Effective Backdoor Mitigation Depends on the Pre-Training Objective." NeurIPS 2023 Workshops: BUGS, 2023.

Markdown

[Verma et al. "Effective Backdoor Mitigation Depends on the Pre-Training Objective." NeurIPS 2023 Workshops: BUGS, 2023.](https://mlanthology.org/neuripsw/2023/verma2023neuripsw-effective/)

BibTeX

@inproceedings{verma2023neuripsw-effective,
  title     = {{Effective Backdoor Mitigation Depends on the Pre-Training Objective}},
  author    = {Verma, Sahil and Bhatt, Gantavya and Singhal, Soumye and Das, Arnav Mohanty and Shah, Chirag and Dickerson, John P and Bilmes, Jeff},
  booktitle = {NeurIPS 2023 Workshops: BUGS},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/verma2023neuripsw-effective/}
}