Backdoor Attacks Against Transformers with Attention Enhancement

Abstract

With the popularity of transformers in natural language processing (NLP) applications, there are growing concerns about their security. Most existing NLP attack methods focus on injecting stealthy trigger words/phrases. In this paper, we focus on the interior structure of neural networks and the Trojan mechanism. Focusing on the prominent NLP transformer models, we propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention pattern. TAL significantly improves the attack efficacy; it achieves better successful rates and uses a much smaller poisoning rate (i.e., a smaller proportion of poisoned samples). It boosts attack efficacy for not only traditional dirty-label attacks, but also the more challenging clean-label attacks. TAL is compatible with existing attack methods and can be easily adapted to different backbone transformer models.

Cite

Text

Lyu et al. "Backdoor Attacks Against Transformers with Attention Enhancement." ICLR 2023 Workshops: BANDS, 2023.

Markdown

[Lyu et al. "Backdoor Attacks Against Transformers with Attention Enhancement." ICLR 2023 Workshops: BANDS, 2023.](https://mlanthology.org/iclrw/2023/lyu2023iclrw-backdoor/)

BibTeX

@inproceedings{lyu2023iclrw-backdoor,
  title     = {{Backdoor Attacks Against Transformers with Attention Enhancement}},
  author    = {Lyu, Weimin and Zheng, Songzhu and Ling, Haibin and Chen, Chao},
  booktitle = {ICLR 2023 Workshops: BANDS},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/lyu2023iclrw-backdoor/}
}