Improving Black-Box Adversarial Attacks with a Transfer-Based Prior

Abstract

We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

Cite

Text

Cheng et al. "Improving Black-Box Adversarial Attacks with a Transfer-Based Prior." Neural Information Processing Systems, 2019.

Markdown

[Cheng et al. "Improving Black-Box Adversarial Attacks with a Transfer-Based Prior." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/cheng2019neurips-improving/)

BibTeX

@inproceedings{cheng2019neurips-improving,
  title     = {{Improving Black-Box Adversarial Attacks with a Transfer-Based Prior}},
  author    = {Cheng, Shuyu and Dong, Yinpeng and Pang, Tianyu and Su, Hang and Zhu, Jun},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {10934-10944},
  url       = {https://mlanthology.org/neurips/2019/cheng2019neurips-improving/}
}