Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups
Abstract
Self-attributing neural networks (SANNs) present a potential path towards interpretable models for high-dimensional problems, but often face significant trade-offs in performance. In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. Motivated by these insights, we propose Sum-of-Parts (SOP), a framework that transforms any differentiable model into a group-based SANN, where feature groups are learned end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and we validate that the groups are interpretable on a range of quantitative and semantic metrics. We further validate the utility of SOP explanations in model debugging and cosmological scientific discovery.
Cite
Text
You et al. "Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[You et al. "Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/you2025icml-sumofparts/)BibTeX
@inproceedings{you2025icml-sumofparts,
title = {{Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups}},
author = {You, Weiqiu and Qu, Helen and Gatti, Marco and Jain, Bhuvnesh and Wong, Eric},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {72747-72785},
volume = {267},
url = {https://mlanthology.org/icml/2025/you2025icml-sumofparts/}
}