Position: XAI Needs Formal Notions of Explanation Correctness

Abstract

The use of machine learning (ML) in critical domains such as medicine poses risks and requires regulation. One requirement is that decisions of ML systems in high-risk applications should be human-understandable. The field of "explainable artificial intelligence" (XAI) seemingly addresses this need. However, in its current form, XAI is unfit to provide quality control for ML; it itself needs scrutiny. Popular XAI methods cannot reliably answer important questions about ML models, their training data, or a given test input. We recapitulate results demonstrating that popular feature attribution and counterfactual estimation methods systematically attribute importance to input features that are independent of the prediction target, and that popular faithfulness metrics incentivize attribution to such features. This limits their utility for purposes such as model and data (in)validation, model improvement, and scientific discovery. We argue that the fundamental reason for this limitation is that current XAI methods do not address well-defined problems and are not evaluated against objective criteria of explanation correctness. Researchers should formally define the problems they intend to solve first and then design methods accordingly. This will lead to notions of explanation correctness that can be theoretically verified and objective metrics of explanation performance that can be assessed using ground-truth data.

Cite

Text

Haufe et al. "Position: XAI Needs Formal Notions of Explanation Correctness." NeurIPS 2024 Workshops: InterpretableAI, 2024.

Markdown

[Haufe et al. "Position: XAI Needs Formal Notions of Explanation Correctness." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/haufe2024neuripsw-position/)

BibTeX

@inproceedings{haufe2024neuripsw-position,
  title     = {{Position: XAI Needs Formal Notions of Explanation Correctness}},
  author    = {Haufe, Stefan and Wilming, Rick and Clark, Benedict and Zhumagambetov, Rustam and Panknin, Danny and Boubekki, Ahcene},
  booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/haufe2024neuripsw-position/}
}