Bayesian Concept Bottleneck Models with LLM Priors

Jean Feng, Avni Kothari, Lucas Zier, Chandan Singh, Yan Shuo Tan

NeurIPS 2025

/neurips/2025/feng2025neurips-bayesian/

Abstract

Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between exploring a sufficiently large set of concepts versus controlling the cost of obtaining concept extractions, resulting in a large interpretability-accuracy tradeoff. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior. Even though LLMs can be miscalibrated and hallucinate, we prove that BC-LLM can provide rigorous statistical inference and uncertainty quantification. Across image, text, and tabular datasets, BC-LLM outperforms interpretable baselines and even black-box models in certain settings, converges more rapidly towards relevant concepts, and is more robust to out-of-distribution samples.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Feng et al. "Bayesian Concept Bottleneck Models with LLM Priors." Advances in Neural Information Processing Systems, 2025.

Markdown

[Feng et al. "Bayesian Concept Bottleneck Models with LLM Priors." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/feng2025neurips-bayesian/)

BibTeX

@inproceedings{feng2025neurips-bayesian,
  title     = {{Bayesian Concept Bottleneck Models with LLM Priors}},
  author    = {Feng, Jean and Kothari, Avni and Zier, Lucas and Singh, Chandan and Tan, Yan Shuo},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/feng2025neurips-bayesian/}
}