Toward Practical Human-Interpretable Explanations

Malach, Alon; Meiseles, Amiel; Bitton, Ron; Momiyama, Satoru; Araki, Toshinori; Furukawa, Jun; Elovici, Yuval; Shabtai, Asaf

doi:10.1007/S10994-025-06852-8

Toward Practical Human-Interpretable Explanations

Alon Malach, Amiel Meiseles, Ron Bitton, Satoru Momiyama, Toshinori Araki, Jun Furukawa, Yuval Elovici, Asaf Shabtai

MLJ 2025 pp. 209

doi:10.1007/S10994-025-06852-8 /mlj/2025/malach2025mlj-practical/

Abstract

Abstract Model-agnostic feature attribution techniques are used to explain the decisions of complex machine learning (ML) models including ensemble models, and deep neural networks (DNNs). However, since complex ML models perform best when trained on low-level features, the explanations generated by these algorithms are often not interpretable or usable by humans. Recently proposed model-agnostic methods that support the generation of human-interpretable explanations are impractical because they require a fully invertible transformation function that maps the model’s input features to human-interpretable features. While some practical human-interpretable explainability methods exist (e.g., concept-based methods), they typically require direct access to the model and are not fully model-agnostic. In this paper, we introduce Latent SHAP, a model-agnostic black-box feature attribution framework that provides human-interpretable explanations without necessitating a fully invertible transformation function. We validate the fidelity of Latent SHAP ’s explanations through quantitative faithfulness assessments on two controlled datasets—a self-generated artificial dataset and the dSprites dataset. Furthermore, we showcase the practical utility of Latent SHAP in various real-world scenarios across domains such as computer vision, natural language processing, and cybersecurity. Each domain involves complex models (ensembles, DNNs, and LLMs), where invertible transformation functions are not available.

PDF MLJ Semantic Scholar

Cite

Text

Malach et al. "Toward Practical Human-Interpretable Explanations." Machine Learning, 2025. doi:10.1007/S10994-025-06852-8

Markdown

[Malach et al. "Toward Practical Human-Interpretable Explanations." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/malach2025mlj-practical/) doi:10.1007/S10994-025-06852-8

BibTeX

@article{malach2025mlj-practical,
  title     = {{Toward Practical Human-Interpretable Explanations}},
  author    = {Malach, Alon and Meiseles, Amiel and Bitton, Ron and Momiyama, Satoru and Araki, Toshinori and Furukawa, Jun and Elovici, Yuval and Shabtai, Asaf},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {209},
  doi       = {10.1007/S10994-025-06852-8},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/malach2025mlj-practical/}
}