Leveraging Sparse Linear Layers for Debuggable Deep Networks
Abstract
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantitatively and via human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
Cite
Text
Wong et al. "Leveraging Sparse Linear Layers for Debuggable Deep Networks." International Conference on Machine Learning, 2021.Markdown
[Wong et al. "Leveraging Sparse Linear Layers for Debuggable Deep Networks." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/wong2021icml-leveraging/)BibTeX
@inproceedings{wong2021icml-leveraging,
title = {{Leveraging Sparse Linear Layers for Debuggable Deep Networks}},
author = {Wong, Eric and Santurkar, Shibani and Madry, Aleksander},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {11205-11216},
volume = {139},
url = {https://mlanthology.org/icml/2021/wong2021icml-leveraging/}
}