Influence Decompositions for Neural Network Attribution
Abstract
Methods of neural network attribution have emerged out of a necessity for explanation and accountability in the predictions of black-box neural models. Most approaches use a variation of sensitivity analysis, where individual input variables are perturbed and the downstream effects on some output metric are measured. We demonstrate that a number of critical functional properties are not revealed when only considering lower-order perturbations. Motivated by these shortcomings, we propose a general framework for decomposing the orders of influence that a collection of input variables has on an output classification. These orders are based on the cardinality of input subsets which are perturbed to yield a change in classification. This decomposition can be naturally applied to attribute which input variables rely on higher-order coordination to impact the classification decision. We demonstrate that our approach correctly identifies higher-order attribution on a number of synthetic examples. Additionally, we showcase the differences between attribution in our approach and existing approaches on benchmark networks for MNIST and ImageNet.
Cite
Text
Reing et al. "Influence Decompositions for Neural Network Attribution." Artificial Intelligence and Statistics, 2021.Markdown
[Reing et al. "Influence Decompositions for Neural Network Attribution." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/reing2021aistats-influence/)BibTeX
@inproceedings{reing2021aistats-influence,
title = {{Influence Decompositions for Neural Network Attribution}},
author = {Reing, Kyle and Ver Steeg, Greg and Galstyan, Aram},
booktitle = {Artificial Intelligence and Statistics},
year = {2021},
pages = {2710-2718},
volume = {130},
url = {https://mlanthology.org/aistats/2021/reing2021aistats-influence/}
}