Sum-of-Parts Models: Faithful Attributions for Groups of Features

Abstract

An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature attributions that are faithful-by-construction. This model decomposes a prediction into an interpretable sum of scores, each of which is directly attributable to a sparse group of features. We evaluate SOP on benchmarks with standard interpretability metrics, and in a case study, we use the faithful explanations from SOP to help astrophysicists discover new knowledge about galaxy formation.

Cite

Text

You et al. "Sum-of-Parts Models: Faithful Attributions for Groups of Features." NeurIPS 2023 Workshops: XAIA, 2023.

Markdown

[You et al. "Sum-of-Parts Models: Faithful Attributions for Groups of Features." NeurIPS 2023 Workshops: XAIA, 2023.](https://mlanthology.org/neuripsw/2023/you2023neuripsw-sumofparts/)

BibTeX

@inproceedings{you2023neuripsw-sumofparts,
  title     = {{Sum-of-Parts Models: Faithful Attributions for Groups of Features}},
  author    = {You, Weiqiu and Qu, Helen and Gatti, Marco and Jain, Bhuvnesh and Wong, Eric},
  booktitle = {NeurIPS 2023 Workshops: XAIA},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/you2023neuripsw-sumofparts/}
}