On Gradient-like Explanation Under a Black-Box Setting: When Black-Box Explanations Become as Good as White-Box
Abstract
Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), a method that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
Cite
Text
Cai and Wunder. "On Gradient-like Explanation Under a Black-Box Setting: When Black-Box Explanations Become as Good as White-Box." International Conference on Machine Learning, 2024.Markdown
[Cai and Wunder. "On Gradient-like Explanation Under a Black-Box Setting: When Black-Box Explanations Become as Good as White-Box." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/cai2024icml-gradientlike/)BibTeX
@inproceedings{cai2024icml-gradientlike,
title = {{On Gradient-like Explanation Under a Black-Box Setting: When Black-Box Explanations Become as Good as White-Box}},
author = {Cai, Yi and Wunder, Gerhard},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {5360-5382},
volume = {235},
url = {https://mlanthology.org/icml/2024/cai2024icml-gradientlike/}
}