MEME: Generating RNN Model Explanations via Model Extraction

Abstract

Recurrent Neural Networks (RNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering RNN-based approaches is improving their explainability and interpretability. In this work we present MEME: a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable concepts and their interactions. We demonstrate how MEME can be applied to two multivariate, continuous data case studies: Room Occupation Prediction, and In-Hospital Mortality Prediction. Using these case-studies, we show how our extracted models can be used to interpret RNNs both locally and globally, by approximating RNN decision-making via interpretable concept interactions.

Cite

Text

Kazhdan et al. "MEME: Generating RNN Model Explanations via Model Extraction." NeurIPS 2020 Workshops: HAMLETS, 2020.

Markdown

[Kazhdan et al. "MEME: Generating RNN Model Explanations via Model Extraction." NeurIPS 2020 Workshops: HAMLETS, 2020.](https://mlanthology.org/neuripsw/2020/kazhdan2020neuripsw-meme/)

BibTeX

@inproceedings{kazhdan2020neuripsw-meme,
  title     = {{MEME: Generating RNN Model Explanations via Model Extraction}},
  author    = {Kazhdan, Dmitry and Dimanov, Botty and Jamnik, Mateja and Liò, Pietro},
  booktitle = {NeurIPS 2020 Workshops: HAMLETS},
  year      = {2020},
  url       = {https://mlanthology.org/neuripsw/2020/kazhdan2020neuripsw-meme/}
}