Evaluating Shortcut Utilization in Deep Learning Disease Classification Through Counterfactual Analysis
Abstract
Although deep learning models can surpass human performance in many medical image analysis tasks, they remain vulnerable to algorithmic shortcuts, where spurious correlations in the data are exploited, which may lead to reduced trust in their predictions/classifications. This issue is especially concerning when models rely on protected attributes (e.g., sex, race, or site) as shortcuts. Such shortcut reliance not only impairs their ability to generalize to unseen datasets but also raises fairness concerns, ultimately undermining their purpose for computer-aided diagnosis. Previous techniques for analyzing protected attributes, such as supervised prediction layer information tests, only highlight the presence of protected attributes in the feature space but do not confirm their role in solving the primary task. Determining the impact of protected attributes as shortcuts is particularly challenging, as it requires knowing how a model would perform without those attributes — a counterfactual scenario typically unattainable in real-world data. As a workaround, researchers have addressed the absence of counterfactuals by generating synthetic datasets with and without protected attributes. In this study, we propose a novel approach to evaluate real-world datasets and determine the extent to which each protected attribute is used as a shortcut in a classification task. Therefore, we define and train a causal generative model to produce causally-grounded counterfactuals, removing protected attributes from activations and allowing us to measure their impact on model performance. Employing T1-weighted MRI data from 9 sites (835 subjects: 426 with Parkinson’s disease (PD) and 409 healthy), we demonstrate that counterfactually removing the 'site' attribute from the penultimate layer of a trained classification model reduced the AUROC for PD classification from 0.74 to 0.65, indicating a 9% performance improvement achieved by using 'site' as a shortcut. In contrast, counterfactually removing the 'sex' attribute had minimal impact on performance, with only a slight change of 0.004, indicating that 'sex' was not utilized as a shortcut by the classification model. The proposed method offers a robust framework for assessing shortcut utilization in medical image classification, paving the way for improved bias detection and mitigation in medical imaging tasks. The code for this work is available on https://github.com/vibujithan/shortcut-analysis.
Cite
Text
Vigneshwaran et al. "Evaluating Shortcut Utilization in Deep Learning Disease Classification Through Counterfactual Analysis." Medical Imaging with Deep Learning, 2025.Markdown
[Vigneshwaran et al. "Evaluating Shortcut Utilization in Deep Learning Disease Classification Through Counterfactual Analysis." Medical Imaging with Deep Learning, 2025.](https://mlanthology.org/midl/2025/vigneshwaran2025midl-evaluating/)BibTeX
@inproceedings{vigneshwaran2025midl-evaluating,
title = {{Evaluating Shortcut Utilization in Deep Learning Disease Classification Through Counterfactual Analysis}},
author = {Vigneshwaran, Vibujithan and Stanley, Emma A.M. and Souza, Raissa and Ohara, Erik and Wilms, Matthias and Forkert, Nils},
booktitle = {Medical Imaging with Deep Learning},
year = {2025},
url = {https://mlanthology.org/midl/2025/vigneshwaran2025midl-evaluating/}
}