Model Interpretability Through the Lens of Computational Complexity

Abstract

In spite of several claims stating that some models are more interpretable than others --e.g., "linear models are more interpretable than deep neural networks"-- we still lack a principled notion of interpretability that allows us to formally compare among different classes of models. We make a step towards such a theory by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class C1 of models is more interpretable than another class C2, if the computational complexity of answering post-hoc queries for models in C2 is higher than for C1. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P!=NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.

PDF NeurIPS Semantic Scholar

Cite

Text

Barceló et al. "Model Interpretability Through the Lens of Computational Complexity." Neural Information Processing Systems, 2020.

Markdown

[Barceló et al. "Model Interpretability Through the Lens of Computational Complexity." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/barcelo2020neurips-model/)

BibTeX

@inproceedings{barcelo2020neurips-model,
  title     = {{Model Interpretability Through the Lens of Computational Complexity}},
  author    = {Barceló, Pablo and Monet, Mikaël and Pérez, Jorge and Subercaseaux, Bernardo},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/barcelo2020neurips-model/}
}