PubDef: Defending Against Transfer Attacks from Public Models
Abstract
Adversarial attacks have been a looming and unaddressed threat in the industry. However, through a decade-long history of the robustness evaluation literature, we have learned that mounting a strong or optimal attack is challenging. It requires both machine learning and domain expertise. In other words, the white-box threat model, religiously assumed by a large majority of the past literature, is unrealistic. In this paper, we propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We argue that this setting will become the most prevalent for security-sensitive applications in the future. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR-10, CIFAR-100, and ImageNet). Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on ImageNet, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%). We release our code at https://github.com/wagner-group/pubdef.
Cite
Text
Sitawarin et al. "PubDef: Defending Against Transfer Attacks from Public Models." International Conference on Learning Representations, 2024.Markdown
[Sitawarin et al. "PubDef: Defending Against Transfer Attacks from Public Models." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/sitawarin2024iclr-pubdef/)BibTeX
@inproceedings{sitawarin2024iclr-pubdef,
title = {{PubDef: Defending Against Transfer Attacks from Public Models}},
author = {Sitawarin, Chawin and Chang, Jaewon and Huang, David and Altoyan, Wesson and Wagner, David},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/sitawarin2024iclr-pubdef/}
}