On Frank-Wolfe Adversarial Training

Abstract

We develop a theoretical framework for adversarial training (AT) with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the distortion of $\ell_\infty$ FW attacks (the attack's $\ell_2$ norm). Specifically, we show that high distortion of FW attacks is equivalent to low variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that $\ell_\infty$ attacks against robust models achieve near maximal $\ell_2$ distortion. To demonstrate the utility of our theoretical framework we develop FW-Adapt, a novel adversarial training algorithm which uses simple distortion measure to adapt the number of attack steps during training. FW-Adapt provides strong robustness against white- and black-box attacks at lower training times than PGD-AT.

Cite

Text

Tsiligkaridis and Roberts. "On Frank-Wolfe Adversarial Training." ICML 2021 Workshops: AML, 2021.

Markdown

[Tsiligkaridis and Roberts. "On Frank-Wolfe Adversarial Training." ICML 2021 Workshops: AML, 2021.](https://mlanthology.org/icmlw/2021/tsiligkaridis2021icmlw-frankwolfe/)

BibTeX

@inproceedings{tsiligkaridis2021icmlw-frankwolfe,
  title     = {{On Frank-Wolfe Adversarial Training}},
  author    = {Tsiligkaridis, Theodoros and Roberts, Jay},
  booktitle = {ICML 2021 Workshops: AML},
  year      = {2021},
  url       = {https://mlanthology.org/icmlw/2021/tsiligkaridis2021icmlw-frankwolfe/}
}