Disentangled Concept-Residual Models: Bridging the Interpretability–Performance Gap for Incomplete Concept Sets
Abstract
Deploying AI in high-stakes settings requires models that are not only accurate but also interpretable and amenable to human oversight. Concept Bottleneck Models (CBMs) support these goals by structuring predictions around human-understandable concepts, enabling interpretability and post-hoc human intervenability. However, CBMs rely on a ‘complete’ concept set, requiring practitioners to define and label enough concepts to match the predictive power of black-box models. To relax this requirement, prior work introduced residual connections that bypass the concept layer and recover information missing from an incomplete concept set. While effective in bridging the performance gap, these residuals can redundantly encode concept information, a phenomenon we term \textbf{concept-residual overlap}. In this work, we investigate the effects of concept-residual overlap and evaluate strategies to mitigate it. We (1) define metrics to quantify the extent of concept-residual overlap in CRMs; (2) introduce complementary metrics to evaluate how this overlap impacts interpretability, concept importance, and the effectiveness of concept-based interventions; and (3) present \textbf{Disentangled Concept-Residual Models (D-CRMs)}, a general class of CRMs designed to mitigate this issue. Within this class, we propose a novel disentanglement approach based on minimizing mutual information (MI). Using CelebA, CIFAR100, AA2, CUB, and OAI, we show that standard CRMs exhibit significant concept-residual overlap, and that reducing this overlap with MI-based D-CRMs restores key properties of CBMs, including interpretability, functional reliance on concepts, and intervention robustness, without sacrificing predictive performance.
Cite
Text
Zabounidis et al. "Disentangled Concept-Residual Models: Bridging the Interpretability–Performance Gap for Incomplete Concept Sets." Transactions on Machine Learning Research, 2026.Markdown
[Zabounidis et al. "Disentangled Concept-Residual Models: Bridging the Interpretability–Performance Gap for Incomplete Concept Sets." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/zabounidis2026tmlr-disentangled/)BibTeX
@article{zabounidis2026tmlr-disentangled,
title = {{Disentangled Concept-Residual Models: Bridging the Interpretability–Performance Gap for Incomplete Concept Sets}},
author = {Zabounidis, Renos and Oguntola, Ini and Zhao, Konghao and Campbell, Joseph and Kim, Woojun and Stepputtis, Simon and Sycara, Katia P.},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/zabounidis2026tmlr-disentangled/}
}