Towards Semantics- and Domain-Aware Adversarial Attacks

Jianping Zhang, Yung-Chieh Huang, Weibin Wu, Michael R. Lyu

IJCAI 2023 pp. 536-544

doi:10.24963/IJCAI.2023/60 /ijcai/2023/zhang2023ijcai-semantics/

Abstract

Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.

PDF IJCAI Semantic Scholar

Cite

Text

Zhang et al. "Towards Semantics- and Domain-Aware Adversarial Attacks." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/60

Markdown

[Zhang et al. "Towards Semantics- and Domain-Aware Adversarial Attacks." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/zhang2023ijcai-semantics/) doi:10.24963/IJCAI.2023/60

BibTeX

@inproceedings{zhang2023ijcai-semantics,
  title     = {{Towards Semantics- and Domain-Aware Adversarial Attacks}},
  author    = {Zhang, Jianping and Huang, Yung-Chieh and Wu, Weibin and Lyu, Michael R.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {536-544},
  doi       = {10.24963/IJCAI.2023/60},
  url       = {https://mlanthology.org/ijcai/2023/zhang2023ijcai-semantics/}
}