Evaluating the Adversarial Robustness of CNNs Layer by Layer

Wang, Yaowen; Cullina, Daniel

Evaluating the Adversarial Robustness of CNNs Layer by Layer

TMLR 2026

/tmlr/2026/wang2026tmlr-evaluating/

Abstract

In order to measure the adversarial robustness of a feature extractor, Bhagoji et al. introduced a distance on example spaces measuring the minimum perturbation of a pair of examples to achieve identical feature extractor outputs. They related these distances to the best possible robust accuracy of any classifier using the feature extractor. By viewing initial layers of a neural network as a feature extractor, this provides a method of attributing adversarial vulnerability of the classifier as a whole to individual layers. However, this framework views any injective feature extractor as perfectly robust: any bad choices of feature representation can be undone by later layers. Thus the framework attributes all adversarial vulnerabilities to the layers that perform dimensionality reduction. Feature spaces at intermediate layers of convolutional neural networks are generally much larger than input spaces, so this methodology provides no information about the contributions of individual layers to the overall robustness of the network. We extend the framework to evaluate feature extractors with high-dimensional output spaces by composing them with a random linear projection to a lower dimensional space. This results in non-trivial information about the quality of the feature space representations for building an adversarial robust classifier.

PDF TMLR OpenReview Semantic Scholar

Cite

Text

Wang and Cullina. "Evaluating the Adversarial Robustness of CNNs Layer by Layer." Transactions on Machine Learning Research, 2026.

Markdown

[Wang and Cullina. "Evaluating the Adversarial Robustness of CNNs Layer by Layer." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/wang2026tmlr-evaluating/)

BibTeX

@article{wang2026tmlr-evaluating,
  title     = {{Evaluating the Adversarial Robustness of CNNs Layer by Layer}},
  author    = {Wang, Yaowen and Cullina, Daniel},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/wang2026tmlr-evaluating/}
}