Feature Attribution for Deep Learning Models Through Total Variance Decomposition

Abstract

This paper introduces a new approach to feature attribution for deep learning models, quantifying the importance of specific features in model decisions. By decomposing the total variance of model decisions into explained and unexplained fractions, conditioned on the target feature, we define the feature attribution score as the proportion of explained variance. This method offers a solid statistical foundation and normalized quantitative results. When ample data is available, we compute the score directly from test data. For scarce data, we use constrained sampling with generative diffusion models to represent the conditional distribution at a given feature value. We demonstrate the method’s effectiveness on both a synthetic image dataset with known ground truth and OASIS-3 brain MRIs.

Cite

Text

Jin et al. "Feature Attribution for Deep Learning Models Through Total Variance Decomposition." Medical Imaging with Deep Learning, 2025.

Markdown

[Jin et al. "Feature Attribution for Deep Learning Models Through Total Variance Decomposition." Medical Imaging with Deep Learning, 2025.](https://mlanthology.org/midl/2025/jin2025midl-feature/)

BibTeX

@inproceedings{jin2025midl-feature,
  title     = {{Feature Attribution for Deep Learning Models Through Total Variance Decomposition}},
  author    = {Jin, Yinzhu and Zhu, Shen and Fletcher, Tom},
  booktitle = {Medical Imaging with Deep Learning},
  year      = {2025},
  url       = {https://mlanthology.org/midl/2025/jin2025midl-feature/}
}