Provably Robust Adversarial Examples
Abstract
We introduce the concept of provably robust adversarial examples for deep neural networks – connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively refining the region initially obtained via sampling until a refined region is certified to be adversarial with existing state-of-the-art verifiers. At each step, a novel optimization procedure is applied to maximize the region's volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certification objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully finds large provably robust regions including ones containing $\approx 10^{573}$ adversarial examples for pixel intensity and $\approx 10^{599}$ for geometric perturbations. The provability enables our robust examples to be significantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.
Cite
Text
Dimitrov et al. "Provably Robust Adversarial Examples." International Conference on Learning Representations, 2022.Markdown
[Dimitrov et al. "Provably Robust Adversarial Examples." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/dimitrov2022iclr-provably/)BibTeX
@inproceedings{dimitrov2022iclr-provably,
title = {{Provably Robust Adversarial Examples}},
author = {Dimitrov, Dimitar Iliev and Singh, Gagandeep and Gehr, Timon and Vechev, Martin},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/dimitrov2022iclr-provably/}
}