An Evolutionary Algorithm for Black-Box Adversarial Attack Against Explainable Methods
Abstract
The explainability of deep neural networks (DNNs) remains a major challenge in developing trustworthy AI, particularly in high-stakes domains such as medical imaging. Although explainable AI (XAI) techniques have advanced, they remain vulnerable to adversarial perturbations, underscoring the need for more robust evaluation frameworks. Existing adversarial attacks often focus on specific explanation strategies, while recent research has introduced black-box attacks capable of targeting multiple XAI methods. However, these approaches typically craft pixel-level perturbations that require a large number of queries and struggle to effectively attack less granular XAI methods such as Grad-CAM and LIME. To overcome these limitations, we propose a novel attack that generates perturbations using semi-transparent, RGB-valued circles optimized via an evolutionary strategy. This design reduces the number of tunable parameters, improves attack efficiency, and is adaptable to XAI methods with varying levels of granularity. Extensive experiments on medical and natural image datasets demonstrate that our method outperforms state-of-the-art techniques, exposing critical vulnerabilities in current XAI systems and highlighting the need for more robust interpretability frameworks.
Cite
Text
Williams et al. "An Evolutionary Algorithm for Black-Box Adversarial Attack Against Explainable Methods." Transactions on Machine Learning Research, 2025.Markdown
[Williams et al. "An Evolutionary Algorithm for Black-Box Adversarial Attack Against Explainable Methods." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/williams2025tmlr-evolutionary/)BibTeX
@article{williams2025tmlr-evolutionary,
title = {{An Evolutionary Algorithm for Black-Box Adversarial Attack Against Explainable Methods}},
author = {Williams, Phoenix Neale and Schrouff, Jessica and Goetz, Lea},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/williams2025tmlr-evolutionary/}
}