Toward Practical Human-Interpretable Explanations
Abstract
Abstract Model-agnostic feature attribution techniques are used to explain the decisions of complex machine learning (ML) models including ensemble models, and deep neural networks (DNNs). However, since complex ML models perform best when trained on low-level features, the explanations generated by these algorithms are often not interpretable or usable by humans. Recently proposed model-agnostic methods that support the generation of human-interpretable explanations are impractical because they require a fully invertible transformation function that maps the model’s input features to human-interpretable features. While some practical human-interpretable explainability methods exist (e.g., concept-based methods), they typically require direct access to the model and are not fully model-agnostic. In this paper, we introduce Latent SHAP, a model-agnostic black-box feature attribution framework that provides human-interpretable explanations without necessitating a fully invertible transformation function. We validate the fidelity of Latent SHAP ’s explanations through quantitative faithfulness assessments on two controlled datasets—a self-generated artificial dataset and the dSprites dataset. Furthermore, we showcase the practical utility of Latent SHAP in various real-world scenarios across domains such as computer vision, natural language processing, and cybersecurity. Each domain involves complex models (ensembles, DNNs, and LLMs), where invertible transformation functions are not available.
Cite
Text
Malach et al. "Toward Practical Human-Interpretable Explanations." Machine Learning, 2025. doi:10.1007/S10994-025-06852-8Markdown
[Malach et al. "Toward Practical Human-Interpretable Explanations." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/malach2025mlj-practical/) doi:10.1007/S10994-025-06852-8BibTeX
@article{malach2025mlj-practical,
title = {{Toward Practical Human-Interpretable Explanations}},
author = {Malach, Alon and Meiseles, Amiel and Bitton, Ron and Momiyama, Satoru and Araki, Toshinori and Furukawa, Jun and Elovici, Yuval and Shabtai, Asaf},
journal = {Machine Learning},
year = {2025},
pages = {209},
doi = {10.1007/S10994-025-06852-8},
volume = {114},
url = {https://mlanthology.org/mlj/2025/malach2025mlj-practical/}
}