Connecting Certified and Adversarial Training

Abstract

Training certifiably robust neural networks remains a notoriously hard problem. While adversarial training optimizes under-approximations of the worst-case loss, which leads to insufficient regularization for certification, sound certified training methods, optimize loose over-approximations, leading to over-regularization and poor (standard) accuracy. In this work, we propose TAPS, an (unsound) certified training method that combines IBP and PGD training to optimize more precise, although not necessarily sound, worst-case loss approximations, reducing over-regularization and increasing certified and standard accuracies. Empirically, TAPS achieves a new state-of-the-art in many settings, e.g., reaching a certified accuracy of $22$% on TinyImageNet for $\ell_\infty$-perturbations with radius $\epsilon=1/255$. We make our implementation and networks public at https://github.com/eth-sri/taps.

Cite

Text

Mao et al. "Connecting Certified and Adversarial Training." Neural Information Processing Systems, 2023.

Markdown

[Mao et al. "Connecting Certified and Adversarial Training." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/mao2023neurips-connecting/)

BibTeX

@inproceedings{mao2023neurips-connecting,
  title     = {{Connecting Certified and Adversarial Training}},
  author    = {Mao, Yuhao and Müller, Mark and Fischer, Marc and Vechev, Martin},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/mao2023neurips-connecting/}
}