Concept Bottleneck Model with Zero Performance Loss
Abstract
Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.
Cite
Text
Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.Markdown
[Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.](https://mlanthology.org/cpal/2025/wang2025cpal-concept/)BibTeX
@inproceedings{wang2025cpal-concept,
title = {{Concept Bottleneck Model with Zero Performance Loss}},
author = {Wang, Zhenzhen and Popel, Aleksander and Sulam, Jeremias},
booktitle = {Conference on Parsimony and Learning},
year = {2025},
pages = {433-461},
volume = {280},
url = {https://mlanthology.org/cpal/2025/wang2025cpal-concept/}
}