Bayesian Concept Bottleneck Models with LLM Priors
Abstract
Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between exploring a sufficiently large set of concepts versus controlling the cost of obtaining concept extractions, resulting in a large interpretability-accuracy tradeoff. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior. Even though LLMs can be miscalibrated and hallucinate, we prove that BC-LLM can provide rigorous statistical inference and uncertainty quantification. Across image, text, and tabular datasets, BC-LLM outperforms interpretable baselines and even black-box models in certain settings, converges more rapidly towards relevant concepts, and is more robust to out-of-distribution samples.
Cite
Text
Feng et al. "Bayesian Concept Bottleneck Models with LLM Priors." Advances in Neural Information Processing Systems, 2025.Markdown
[Feng et al. "Bayesian Concept Bottleneck Models with LLM Priors." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/feng2025neurips-bayesian/)BibTeX
@inproceedings{feng2025neurips-bayesian,
title = {{Bayesian Concept Bottleneck Models with LLM Priors}},
author = {Feng, Jean and Kothari, Avni and Zier, Lucas and Singh, Chandan and Tan, Yan Shuo},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/feng2025neurips-bayesian/}
}