The Manifold Hypothesis for Gradient-Based Explanations

Abstract

When do gradient-based explanation algorithms provide perceptually-aligned explanations? We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold. To provide evidence for this hypothesis, we introduce a framework based on variational autoencoders that allows to estimate and generate image manifolds. Through experiments across a range of different datasets – MNIST, EMNIST, CIFAR10, X-ray pneumonia and Diabetic Retinopathy detection – we demonstrate that the more a feature attribution is aligned with the tangent space of the data, the more perceptually-aligned it tends to be. We then show that the attributions provided by popular post-hoc methods such as Integrated Gradients and SmoothGrad are more strongly aligned with the data manifold than the raw gradient. Adversarial training also improves the alignment of model gradients with the data manifold. As a consequence, we suggest that explanation algorithms should actively strive to align their explanations with the data manifold. An extended version of this paper is available at https://arxiv.org/abs/2206.07387.

Cite

Text

Bordt et al. "The Manifold Hypothesis for Gradient-Based Explanations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00378

Markdown

[Bordt et al. "The Manifold Hypothesis for Gradient-Based Explanations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/bordt2023cvprw-manifold/) doi:10.1109/CVPRW59228.2023.00378

BibTeX

@inproceedings{bordt2023cvprw-manifold,
  title     = {{The Manifold Hypothesis for Gradient-Based Explanations}},
  author    = {Bordt, Sebastian and Upadhyay, Uddeshya and Akata, Zeynep and von Luxburg, Ulrike},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {3697-3702},
  doi       = {10.1109/CVPRW59228.2023.00378},
  url       = {https://mlanthology.org/cvprw/2023/bordt2023cvprw-manifold/}
}