Saliency Strikes Back: How Filtering Out High Frequencies Improves White-Box Explanations
Abstract
Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model’s decision-making process. We have identified a significant limitation in one type of attribution methods, known as “white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.
Cite
Text
Muzellec et al. "Saliency Strikes Back: How Filtering Out High Frequencies Improves White-Box Explanations." International Conference on Machine Learning, 2024.Markdown
[Muzellec et al. "Saliency Strikes Back: How Filtering Out High Frequencies Improves White-Box Explanations." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/muzellec2024icml-saliency/)BibTeX
@inproceedings{muzellec2024icml-saliency,
title = {{Saliency Strikes Back: How Filtering Out High Frequencies Improves White-Box Explanations}},
author = {Muzellec, Sabine and Fel, Thomas and Boutin, Victor and Andéol, Léo and Vanrullen, Rufin and Serre, Thomas},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {37041-37075},
volume = {235},
url = {https://mlanthology.org/icml/2024/muzellec2024icml-saliency/}
}