Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches

Quazi Mishkatul Alam, Bilel Tarchoun, Ihsen Alouani, Nael Abu-Ghazaleh

WACV 2025 pp. 7123-7132

/wacv/2025/alam2025wacv-adversarial/

Abstract

Deformable vision transformers reduce the expensive quadratic time-complexity of attention modeling by using sparse attention structures making it possible to use transformers in large-scale vision applications such as multi-view vision systems. We show that existing adversarial attacks against conventional vision transformers do not transfer to deformable transformers primarily due to the data-dependent dynamic nature of sparse attention. In this work we present for the first time adversarial attacks against deformable vision transformers by getting control of their attention-inferring module. We develop a novel collaborative attack where a source patch manipulates attention to point to a target patch containing the adversarial noise which fools the model. We observe that our attack alters less than 1% of the patched area in the input field completely disrupting object detection and resulting in 0% AP in single-view object detection using MS COCO and 0% MODA in multi-view object detection using Wildtrack.

PDF WACV Semantic Scholar

Cite

Text

Alam et al. "Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Alam et al. "Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/alam2025wacv-adversarial/)

BibTeX

@inproceedings{alam2025wacv-adversarial,
  title     = {{Adversarial Attention Deficit: Fooling Deformable Vision Transformers with Collaborative Adversarial Patches}},
  author    = {Alam, Quazi Mishkatul and Tarchoun, Bilel and Alouani, Ihsen and Abu-Ghazaleh, Nael},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {7123-7132},
  url       = {https://mlanthology.org/wacv/2025/alam2025wacv-adversarial/}
}