VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers
Abstract
Patch attack, which introduces a perceptible but localized change to the input image, has gained significant momentum in recent years. In this paper, we propose a unified framework to analyze certified patch defense tasks (including both certified detection and certified recovery) using the recently emerged vision transformer. In addition to the existing patch defense setting where only one patch is considered, we provide the very first study on developing certified detection against the \emph{dual patch attack}, in which the attacker is allowed to adversarially manipulate pixels in two different regions. Benefiting from the recent progress in self-supervised vision transformers (\ie, masked autoencoder), our method achieves state-of-the-art performance in both certified detection and certified recovery of adversarial patches. For certified detection, we improve the performance by up to $\app16\%$ on ImageNet without additional training for a single adversarial patch, and for the first time, can also tackle the more challenging dual patch setting. Our method largely \emph{closes the gap} between detection-based certified robustness and clean image accuracy. For certified recovery, our approach improves certified accuracy by $\app2\%$ on ImageNet across all attack sizes, attaining the new state-of-the-art performance.
Cite
Text
Li et al. "VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19806-9_33Markdown
[Li et al. "VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/li2022eccv-vip/) doi:10.1007/978-3-031-19806-9_33BibTeX
@inproceedings{li2022eccv-vip,
title = {{VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers}},
author = {Li, Junbo and Zhang, Huan and Xie, Cihang},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19806-9_33},
url = {https://mlanthology.org/eccv/2022/li2022eccv-vip/}
}