Texplain: Post-Hoc Textual Explanation of Image Classifiers with Pre-Trained Language Models

Abstract

We propose TExplain, using language models to interpret pre-trained image classifiers' features. Our approach connects the feature space of image classifiers with language models, generating explanatory sentences during inference. By extracting frequent words from such explanations, we gain insights into learned features and patterns. This method detects spurious correlations and biases, providing a deeper understanding of the classifier's behavior. Experimental validation on diverse datasets, including ImageNet-9L and Waterbirds, shows potential for improving interpretability and robustness in image classifiers.

Cite

Text

Asgari et al. "Texplain: Post-Hoc Textual Explanation of Image Classifiers with Pre-Trained Language Models." ICLR 2024 Workshops: R2-FM, 2024.

Markdown

[Asgari et al. "Texplain: Post-Hoc Textual Explanation of Image Classifiers with Pre-Trained Language Models." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/asgari2024iclrw-texplain/)

BibTeX

@inproceedings{asgari2024iclrw-texplain,
  title     = {{Texplain: Post-Hoc Textual Explanation of Image Classifiers with Pre-Trained Language Models}},
  author    = {Asgari, Saeid and Khani, Aliasghar and Khasahmadi, Amir Hosein and Sanghi, Aditya and Willis, Karl D.D. and Amiri, Ali Mahdavi},
  booktitle = {ICLR 2024 Workshops: R2-FM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/asgari2024iclrw-texplain/}
}