Feature Compression Is the Root Cause of Adversarial Fragility in Neural Networks

Abstract

In this paper, we uniquely study the adversarial robustness of deep neural networks (NN) for classification tasks against that of optimal classifiers. We look at the smallest magnitude of possible additive perturbations that can change a classifier's output. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural networks for classification. In particular, our theoretical results show that a neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically, we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness of optimal classifiers. Our theories match remarkably well with numerical experiments of practically trained NN, including NN for ImageNet images. The matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.

Cite

Text

Gao et al. "Feature Compression Is the Root Cause of Adversarial Fragility in Neural Networks." International Conference on Learning Representations, 2026.

Markdown

[Gao et al. "Feature Compression Is the Root Cause of Adversarial Fragility in Neural Networks." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/gao2026iclr-feature/)

BibTeX

@inproceedings{gao2026iclr-feature,
  title     = {{Feature Compression Is the Root Cause of Adversarial Fragility in Neural Networks}},
  author    = {Gao, Jingchao and Lu, Ziqing and Mudumbai, Raghu and Wu, Xiaodong and Yi, Jirong and Cho, Myung and Xu, Catherine and Xie, Hui and Xu, Weiyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/gao2026iclr-feature/}
}