Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Abstract

We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As an example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.

Cite

Text

Yang et al. "Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data." Journal of Machine Learning Research, 2020.

Markdown

[Yang et al. "Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data." Journal of Machine Learning Research, 2020.](https://mlanthology.org/jmlr/2020/yang2020jmlr-greedy/)

BibTeX

@article{yang2020jmlr-greedy,
  title     = {{Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data}},
  author    = {Yang, Puyudi and Chen, Jianbo and Hsieh, Cho-Jui and Wang, Jane-Ling and Jordan, Michael I.},
  journal   = {Journal of Machine Learning Research},
  year      = {2020},
  pages     = {1-36},
  volume    = {21},
  url       = {https://mlanthology.org/jmlr/2020/yang2020jmlr-greedy/}
}