BadNL: Backdoor Attacks Against NLP Models

Abstract

Deep Neural Networks (DNNs) have progressed rapidly during the past decade. Meanwhile, DNN models have been shown to be vulnerable to various security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Previous backdoor attacks mainly focus on computer vision tasks. In this paper, we perform the first systematic investigation of the backdoor attack against natural language processing (NLP) models with a focus on sentiment analysis task. Specifically, we propose three methods to construct triggers, including Word-level, Char-level, and Sentence-level triggers. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the Word-level triggers, our backdoor attack achieves a 100% attack success rate with only a utility drop of 0.18%, 1.26%, and 0.19% on three benchmark sentiment analysis datasets.

Cite

Text

Chen et al. "BadNL: Backdoor Attacks Against NLP Models." ICML 2021 Workshops: AML, 2021.

Markdown

[Chen et al. "BadNL: Backdoor Attacks Against NLP Models." ICML 2021 Workshops: AML, 2021.](https://mlanthology.org/icmlw/2021/chen2021icmlw-badnl/)

BibTeX

@inproceedings{chen2021icmlw-badnl,
  title     = {{BadNL: Backdoor Attacks Against NLP Models}},
  author    = {Chen, Xiaoyi and Salem, Ahmed and Backes, Michael and Ma, Shiqing and Zhang, Yang},
  booktitle = {ICML 2021 Workshops: AML},
  year      = {2021},
  url       = {https://mlanthology.org/icmlw/2021/chen2021icmlw-badnl/}
}