Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

Abstract

We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.

Cite

Text

Macdonald et al. "Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings." International Conference on Machine Learning, 2022.

Markdown

[Macdonald et al. "Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/macdonald2022icml-interpretable/)

BibTeX

@inproceedings{macdonald2022icml-interpretable,
  title     = {{Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings}},
  author    = {Macdonald, Jan and Besançon, Mathieu E. and Pokutta, Sebastian},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {14699-14716},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/macdonald2022icml-interpretable/}
}