Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Abstract

We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.

Cite

Text

Zhang et al. "Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections." Neural Information Processing Systems, 2018.

Markdown

[Zhang et al. "Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/zhang2018neurips-interpreting/)

BibTeX

@inproceedings{zhang2018neurips-interpreting,
  title     = {{Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections}},
  author    = {Zhang, Xin and Solar-Lezama, Armando and Singh, Rishabh},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {4874-4885},
  url       = {https://mlanthology.org/neurips/2018/zhang2018neurips-interpreting/}
}