Class-Disentanglement and Applications in Adversarial Detection and Defense
Abstract
What is the minimum necessary information required by a neural net $D(\cdot)$ from an image $x$ to accurately predict its class? Extracting such information in the input space from $x$ can allocate the areas $D(\cdot)$ mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose ''class-disentanglement'' that trains a variational autoencoder $G(\cdot)$ to extract this class-dependent information as $x - G(x)$ via a trade-off between reconstructing $x$ by $G(x)$ and classifying $x$ by $D(x-G(x))$, where the former competes with the latter in decomposing $x$ so the latter retains only necessary information for classification in $x-G(x)$. We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part $x-G(x)$. The decomposition results also provide novel interpretations to classification and attack models. Inspired by these observations, we propose to conduct adversarial detection and adversarial defense respectively on $x - G(x)$ and $G(x)$, which consistently outperform the results on the original $x$. In experiments, this simple approach substantially improves the detection and defense against different types of adversarial attacks.
Cite
Text
Yang et al. "Class-Disentanglement and Applications in Adversarial Detection and Defense." Neural Information Processing Systems, 2021.Markdown
[Yang et al. "Class-Disentanglement and Applications in Adversarial Detection and Defense." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-classdisentanglement/)BibTeX
@inproceedings{yang2021neurips-classdisentanglement,
title = {{Class-Disentanglement and Applications in Adversarial Detection and Defense}},
author = {Yang, Kaiwen and Zhou, Tianyi and Zhang, Yonggang and Tian, Xinmei and Tao, Dacheng},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/yang2021neurips-classdisentanglement/}
}