Concept Bottleneck Model with Zero Performance Loss

Abstract

Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model’s latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.

Cite

Text

Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.

Markdown

[Wang et al. "Concept Bottleneck Model with Zero Performance Loss." Conference on Parsimony and Learning, 2025.](https://mlanthology.org/cpal/2025/wang2025cpal-concept/)

BibTeX

@inproceedings{wang2025cpal-concept,
  title     = {{Concept Bottleneck Model with Zero Performance Loss}},
  author    = {Wang, Zhenzhen and Popel, Aleksander and Sulam, Jeremias},
  booktitle = {Conference on Parsimony and Learning},
  year      = {2025},
  pages     = {433-461},
  volume    = {280},
  url       = {https://mlanthology.org/cpal/2025/wang2025cpal-concept/}
}