Adaptive Test-Time Intervention for Concept Bottleneck Models
Abstract
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across $4$ datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.
Cite
Text
Shen et al. "Adaptive Test-Time Intervention for Concept Bottleneck Models." ICLR 2025 Workshops: BuildingTrust, 2025.Markdown
[Shen et al. "Adaptive Test-Time Intervention for Concept Bottleneck Models." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/shen2025iclrw-adaptive/)BibTeX
@inproceedings{shen2025iclrw-adaptive,
title = {{Adaptive Test-Time Intervention for Concept Bottleneck Models}},
author = {Shen, Matthew and Hsu, Aliyah R. and Agarwal, Abhineet and Yu, Bin},
booktitle = {ICLR 2025 Workshops: BuildingTrust},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/shen2025iclrw-adaptive/}
}