Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Abstract
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's *intervenability* as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.
Cite
Text
Marcinkevičs et al. "Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?." NeurIPS 2023 Workshops: XAIA, 2023.Markdown
[Marcinkevičs et al. "Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?." NeurIPS 2023 Workshops: XAIA, 2023.](https://mlanthology.org/neuripsw/2023/marcinkevics2023neuripsw-beyond/)BibTeX
@inproceedings{marcinkevics2023neuripsw-beyond,
title = {{Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?}},
author = {Marcinkevičs, Ričards and Laguna, Sonia and Vandenhirtz, Moritz and Vogt, Julia},
booktitle = {NeurIPS 2023 Workshops: XAIA},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/marcinkevics2023neuripsw-beyond/}
}