Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

Abstract

There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations – in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we can exactly characterize the linear contribution of each input to the model predictions; we can use a change-of-basis to disentangle input, output, and computational hidden unit subspaces; we can fully reverse-engineer the architecture’s solution to a simple task. Despite this ease of interpretation, the input switched affine network achieves reasonable performance on a text modeling tasks, and allows greater computational efficiency than networks with standard nonlinearities.

Cite

Text

Foerster et al. "Input Switched Affine Networks: An RNN Architecture Designed for Interpretability." International Conference on Machine Learning, 2017.

Markdown

[Foerster et al. "Input Switched Affine Networks: An RNN Architecture Designed for Interpretability." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/foerster2017icml-input/)

BibTeX

@inproceedings{foerster2017icml-input,
  title     = {{Input Switched Affine Networks: An RNN Architecture Designed for Interpretability}},
  author    = {Foerster, Jakob N. and Gilmer, Justin and Sohl-Dickstein, Jascha and Chorowski, Jan and Sussillo, David},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {1136-1145},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/foerster2017icml-input/}
}