Interpreting Global Perturbation Robustness of Image Models Using Axiomatic Spectral Importance Decomposition

Abstract

Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, *eg.* mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals -- yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as **I-ASIDE** (**I**mage **A**xiomatic **S**pectral **I**mportance **D**ecomposition **E**xplanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet, including both convolutional neural networks (*eg.* *AlexNet*, *VGG*, *GoogLeNet/Inception-v1*, *Inception-v3*, *ResNet*, *SqueezeNet*, *RegNet*, *MnasNet*, *MobileNet*, *EfficientNet*, *etc.*) and vision transformers (*eg.* *ViT*, *Swin Transformer*, and *MaxViT*), to show that **I-ASIDE** can not only **measure** the perturbation robustness but also **provide interpretations** of its mechanisms.

Cite

Text

Luo et al. "Interpreting Global Perturbation Robustness of Image Models Using Axiomatic Spectral Importance Decomposition." Transactions on Machine Learning Research, 2024.

Markdown

[Luo et al. "Interpreting Global Perturbation Robustness of Image Models Using Axiomatic Spectral Importance Decomposition." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/luo2024tmlr-interpreting/)

BibTeX

@article{luo2024tmlr-interpreting,
  title     = {{Interpreting Global Perturbation Robustness of Image Models Using Axiomatic Spectral Importance Decomposition}},
  author    = {Luo, Roisin and McDermott, James and O'Riordan, Colm},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/luo2024tmlr-interpreting/}
}