Extracting Local Reasoning Chains of Deep Neural Networks
Abstract
We study how to explain the main steps of inference that a pre-trained deep neural net (DNN) relies on to produce predictions for a (sub)task and its data. This problem is related to network pruning and interpretable machine learning with the following highlighted differences: (1) fine-tuning of any neurons/filters is forbidden; (2) we target a very high pruning rate, e.g., ≥ 95%, for better interpretability; (3) the interpretation is for the whole inference process on a few data of a task rather than for individual neurons/filters or a single sample. In this paper, we introduce NeuroChains to extract the local inference chains by optimizing differentiable sparse scores for the filters and layers, which reflects their importance in preserving the outputs on a few data drawn from a given (sub)task. Thereby, NeuroChains can extract an extremely small sub-network composed of critical filters exactly copied from the original pre-trained DNN by removing the filters/layers with small scores. For samples from the same class, we can then visualize the inference pathway in the pre-trained DNN by applying existing interpretation techniques to the retained filters and layers. It reveals how the inference process stitches and integrates the information layer by layer and filter by filter. We provide detailed and insightful case studies together with several quantitative analyses over thousands of trials to demonstrate the quality, sparsity, fidelity and accuracy of the interpretation. In extensive empirical studies on VGG, ResNet, and ViT, NeuroChains significantly enriches the interpretation and makes the inner mechanism of DNNs more transparent.
Cite
Text
Zhao et al. "Extracting Local Reasoning Chains of Deep Neural Networks." Transactions on Machine Learning Research, 2022.Markdown
[Zhao et al. "Extracting Local Reasoning Chains of Deep Neural Networks." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/zhao2022tmlr-extracting/)BibTeX
@article{zhao2022tmlr-extracting,
title = {{Extracting Local Reasoning Chains of Deep Neural Networks}},
author = {Zhao, Haiyan and Zhou, Tianyi and Long, Guodong and Jiang, Jing and Zhang, Chengqi},
journal = {Transactions on Machine Learning Research},
year = {2022},
url = {https://mlanthology.org/tmlr/2022/zhao2022tmlr-extracting/}
}