Correcting Misinterpretations of Additive Models

Abstract

Correct model interpretation in high-stakes settings is critical, yet both post-hoc feature attribution methods and so-called intrinsically interpretable models can systematically attribute false-positive importance to non-informative features such as suppressor variables. Specifically, both linear models and their powerful non-linear generalisation such as General Additive Models (GAMs) are susceptible to spurious attributions to suppressors. We present a principled generalisation of activation patterns - originally developed to make linear models interpretable - to additive models, correctly rejecting suppressor effects for non-linear features. This yields PatternGAM, an importance attribution method based on univariate generative surrogate models for the broad family of additive models, and PatternQLR for polynomial models. Empirical evaluations on the XAI-TRIS benchmark with a novel false-negative invariant formulation of the earth mover's distance accuracy metric demonstrates significant improvements over popular feature attribution methods and the traditional interpretation of additive models. Finally, real-world case studies on the COMPAS and MIMIC-IV datasets provide new insights into the role of specific features by disentangling genuine target-related information from suppression effects that would mislead conventional GAM interpretations.

Cite

Text

Clark et al. "Correcting Misinterpretations of Additive Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Clark et al. "Correcting Misinterpretations of Additive Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/clark2025neurips-correcting/)

BibTeX

@inproceedings{clark2025neurips-correcting,
  title     = {{Correcting Misinterpretations of Additive Models}},
  author    = {Clark, Benedict and Wilming, Rick and Schulz, Hjalmar and Zhumagambetov, Rustam and Panknin, Danny and Haufe, Stefan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/clark2025neurips-correcting/}
}