Explainable Concept Generation Through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations

Abstract

Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based preference optimization (RLPO) algorithm that fine-tunes a vision-language generative model from approximate textual descriptions of concepts. Through a series of experiments, we demonstrate our method’s ability to efficiently and reliably articulate diverse concepts that are otherwise challenging to craft manually.

Cite

Text

Taparia et al. "Explainable Concept Generation Through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Taparia et al. "Explainable Concept Generation Through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/taparia2025icml-explainable/)

BibTeX

@inproceedings{taparia2025icml-explainable,
  title     = {{Explainable Concept Generation Through Vision-Language Preference Learning for Understanding Neural Networks’ Internal Representations}},
  author    = {Taparia, Aditya and Sagar, Som and Senanayake, Ransalu},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {59154-59181},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/taparia2025icml-explainable/}
}