Framework for Evaluating Faithfulness of Local Explanations
Abstract
We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.
Cite
Text
Dasgupta et al. "Framework for Evaluating Faithfulness of Local Explanations." International Conference on Machine Learning, 2022.Markdown
[Dasgupta et al. "Framework for Evaluating Faithfulness of Local Explanations." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/dasgupta2022icml-framework/)BibTeX
@inproceedings{dasgupta2022icml-framework,
title = {{Framework for Evaluating Faithfulness of Local Explanations}},
author = {Dasgupta, Sanjoy and Frost, Nave and Moshkovitz, Michal},
booktitle = {International Conference on Machine Learning},
year = {2022},
pages = {4794-4815},
volume = {162},
url = {https://mlanthology.org/icml/2022/dasgupta2022icml-framework/}
}