Towards Semantics- and Domain-Aware Adversarial Attacks
Abstract
Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.
Cite
Text
Zhang et al. "Towards Semantics- and Domain-Aware Adversarial Attacks." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/60Markdown
[Zhang et al. "Towards Semantics- and Domain-Aware Adversarial Attacks." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/zhang2023ijcai-semantics/) doi:10.24963/IJCAI.2023/60BibTeX
@inproceedings{zhang2023ijcai-semantics,
title = {{Towards Semantics- and Domain-Aware Adversarial Attacks}},
author = {Zhang, Jianping and Huang, Yung-Chieh and Wu, Weibin and Lyu, Michael R.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {536-544},
doi = {10.24963/IJCAI.2023/60},
url = {https://mlanthology.org/ijcai/2023/zhang2023ijcai-semantics/}
}