Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang, Aleksander Popel, Jeremias Sulam

CPAL 2025 pp. 433-461

/cpal/2025/wang2025cpal-concept/

Abstract

Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.

PDF CPAL OpenReview Semantic Scholar

Cite

Text

Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.

Markdown

[Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.](https://mlanthology.org/cpal/2025/wang2025cpal-concept/)

BibTeX

@inproceedings{wang2025cpal-concept,
  title     = {{Concept Bottleneck Model with Zero Performance Loss}},
  author    = {Wang, Zhenzhen and Popel, Aleksander and Sulam, Jeremias},
  booktitle = {Conference on Parsimony and Learning},
  year      = {2025},
  pages     = {433-461},
  volume    = {280},
  url       = {https://mlanthology.org/cpal/2025/wang2025cpal-concept/}
}