Evaluating the Robustness of Explainable AI in Medical Image Recognition Under Natural and Adversarial Data Corruption

Abstract

Abstract The integration of Explainable AI (XAI) into healthcare promises greater transparency and interpretability of machine learning models, enabling clinicians to understand predictions and make more reliable medical decisions. Yet, the robustness of XAI methods remains uncertain, as small input perturbations can drastically change their explanations, posing critical risks in clinical settings where they may lead to misdiagnoses or inappropriate treatment. Motivated by the central role of XAI in healthcare decision-making, this paper examines its robustness in the presence of data corruption. We systematically evaluate the stability of widely used XAI techniques against both naturally occurring noise (e.g., JPEG compression) and adversarial manipulations that alter explanations without affecting model predictions. To this end, we introduce a set of evaluation metrics that capture complementary aspects of explanation stability, ranging from pixel-level consistency to spatial coherence, and propose a protocol for assessing the resilience of XAI methods across diverse perturbation sources. Our analysis spans three medical imaging datasets, various convolutional and transformer models, and ten post-hoc XAI methods, including Grad-CAM++ for convolutional networks and LibraGrad for vision transformers. We find that current XAI techniques are often unstable, even under imperceptible perturbations. For adversarial noise, a clear set of robust methods emerges, whereas for natural noise, performance varies, with some methods maintaining spatial stability and others preserving pixel-wise consistency. All results together highlight the need for multi-perspective evaluation when selecting XAI techniques in practice.

Cite

Text

Repetto et al. "Evaluating the Robustness of Explainable AI in Medical Image Recognition Under Natural and Adversarial Data Corruption." Machine Learning, 2026. doi:10.1007/S10994-025-06919-6

Markdown

[Repetto et al. "Evaluating the Robustness of Explainable AI in Medical Image Recognition Under Natural and Adversarial Data Corruption." Machine Learning, 2026.](https://mlanthology.org/mlj/2026/repetto2026mlj-evaluating/) doi:10.1007/S10994-025-06919-6

BibTeX

@article{repetto2026mlj-evaluating,
  title     = {{Evaluating the Robustness of Explainable AI in Medical Image Recognition Under Natural and Adversarial Data Corruption}},
  author    = {Repetto, Sara and Maljkovic, Igor and Lotto, Michele and Cinà, Antonio Emanuele and Vascon, Sebastiano and Roli, Fabio},
  journal   = {Machine Learning},
  year      = {2026},
  pages     = {4},
  doi       = {10.1007/S10994-025-06919-6},
  volume    = {115},
  url       = {https://mlanthology.org/mlj/2026/repetto2026mlj-evaluating/}
}