SEER: Backdoor Detection for Vision-Language Models Through Searching Target Text and Image Trigger Jointly

Abstract

This paper proposes SEER, a novel backdoor detection algorithm for vision-language models, addressing the gap in the literature on multi-modal backdoor detection. While backdoor detection in single-modal models has been well studied, the investigation of such defenses in multi-modal models remains limited. Existing backdoor defense mechanisms cannot be directly applied to multi-modal settings due to their increased complexity and search space explosion. In this paper, we propose to detect backdoors in vision-language models by jointly searching image triggers and malicious target texts in feature space shared by vision and language modalities. Our extensive experiments demonstrate that SEER can achieve over 92% detection rate on backdoor detection in vision-language models in various settings without accessing training data or knowledge of downstream tasks.

Cite

Text

Zhu et al. "SEER: Backdoor Detection for Vision-Language Models Through Searching Target Text and Image Trigger Jointly." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I7.28611

Markdown

[Zhu et al. "SEER: Backdoor Detection for Vision-Language Models Through Searching Target Text and Image Trigger Jointly." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/zhu2024aaai-seer/) doi:10.1609/AAAI.V38I7.28611

BibTeX

@inproceedings{zhu2024aaai-seer,
  title     = {{SEER: Backdoor Detection for Vision-Language Models Through Searching Target Text and Image Trigger Jointly}},
  author    = {Zhu, Liuwan and Ning, Rui and Li, Jiang and Xin, Chunsheng and Wu, Hongyi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {7766-7774},
  doi       = {10.1609/AAAI.V38I7.28611},
  url       = {https://mlanthology.org/aaai/2024/zhu2024aaai-seer/}
}