Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks
Abstract
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.
Cite
Text
Ancona et al. "Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks." International Conference on Learning Representations, 2018.Markdown
[Ancona et al. "Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks." International Conference on Learning Representations, 2018.](https://mlanthology.org/iclr/2018/ancona2018iclr-better/)BibTeX
@inproceedings{ancona2018iclr-better,
title = {{Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks}},
author = {Ancona, Marco and Ceolini, Enea and Öztireli, Cengiz and Gross, Markus},
booktitle = {International Conference on Learning Representations},
year = {2018},
url = {https://mlanthology.org/iclr/2018/ancona2018iclr-better/}
}