Explainable Concept Generation Through Vision-Language Preference Learning

Abstract

Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess multiple candidate concept image sets, which can often be imprecise. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate the capability of our method to articulate complex, abstract concepts that are otherwise challenging to craft manually.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Taparia et al. "Explainable Concept Generation Through Vision-Language Preference Learning." NeurIPS 2024 Workshops: InterpretableAI, 2024.

Markdown

[Taparia et al. "Explainable Concept Generation Through Vision-Language Preference Learning." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/taparia2024neuripsw-explainable/)

BibTeX

@inproceedings{taparia2024neuripsw-explainable,
  title     = {{Explainable Concept Generation Through Vision-Language Preference Learning}},
  author    = {Taparia, Aditya and Sagar, Som and Senanayake, Ransalu},
  booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/taparia2024neuripsw-explainable/}
}