Debugging Concept Bottlenecks Through Intervention: Shortcut Removal and Retraining

Enouen, Eric; Galhotra, Sainyam

Debugging Concept Bottlenecks Through Intervention: Shortcut Removal and Retraining

ICLRW 2025

/iclrw/2025/enouen2025iclrw-debugging/

Abstract

Machine learning models often learn unintended shortcuts (spurious correlations) that do not reflect the true causal structure of a task and thus degrade dramatically under subpopulation shift. This problem becomes especially severe in high-stakes domains where the cost of relying on misaligned shortcuts is prohibitive. To address this challenge, concept bottlenecks explicitly factor predictions into high-level concepts and a simple decision layer, enabling experts to diagnose whether learned concepts align with their domain knowledge. Yet, simply removing undesirable concepts after training is insufficient to prevent shortcuts when the concept encoder is incomplete or entangled. In this work, we propose *CBDebug*, a novel framework to debug concept bottlenecks for robustness under subpopulation shift. First, a domain expert identifies and removes spurious concepts using model explanations (the *Removal* step). Then, leveraging this human feedback, we disentangle or replace the removed shortcuts by retraining on a rebalanced dataset based on the causal graph (the *Retraining* step). Empirically, *CBDebug* significantly outperforms existing concept-based methods. Overall, our work demonstrates how expert-guided debugging of concept bottlenecks can achieve interpretability and robustness, promoting alignment of a model’s internal reasoning with how humans reason.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Enouen and Galhotra. "Debugging Concept Bottlenecks Through Intervention: Shortcut Removal and Retraining." ICLR 2025 Workshops: SCSL, 2025.

Markdown

[Enouen and Galhotra. "Debugging Concept Bottlenecks Through Intervention: Shortcut Removal and Retraining." ICLR 2025 Workshops: SCSL, 2025.](https://mlanthology.org/iclrw/2025/enouen2025iclrw-debugging/)

BibTeX

@inproceedings{enouen2025iclrw-debugging,
  title     = {{Debugging Concept Bottlenecks Through Intervention: Shortcut Removal and Retraining}},
  author    = {Enouen, Eric and Galhotra, Sainyam},
  booktitle = {ICLR 2025 Workshops: SCSL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/enouen2025iclrw-debugging/}
}