Concept Bottleneck Generative Models

Abstract

Despite their increasing prevalence, generative models remain opaque and difficult to steer reliably. To address these challenges, we present concept bottleneck (CB) generative models, a type of generative model where one of its internal layers—a concept bottleneck (CB) layer—is constrained to encode human-understandable features. While concept-botttleneck layers have been used to improved interpretability for supervised learning tasks, here we extend them generative models. The concept bottleneck layer partitions the generative model into three parts: the pre-concept bottleneck portion, the CB layer, and the post-concept bottleneck portion. To train CB generative models, we complement the traditional task-based loss function for training generative models with three additional loss terms: a concept loss, an orthogonality loss, and a concept sensitivity loss. The CB layer and these corresponding loss terms are model agnostic, which we demonstrate by applying them to three different families of generative models: generative adversarial networks, variational autoencoders, and diffusion models. On real-world datasets, across three types of generative models, steering a generative model with the CB layer outperforms several baselines.

Cite

Text

Ismail et al. "Concept Bottleneck Generative Models." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Ismail et al. "Concept Bottleneck Generative Models." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/ismail2023icmlw-concept/)

BibTeX

@inproceedings{ismail2023icmlw-concept,
  title     = {{Concept Bottleneck Generative Models}},
  author    = {Ismail, Aya Abdelsalam and Adebayo, Julius and Bravo, Hector Corrada and Ra, Stephen and Cho, Kyunghyun},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/ismail2023icmlw-concept/}
}