A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Abstract

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn ''class-specific'' queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via ''multi-head'' cross-attention, INTR could identify different ''attributes'' of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.

Cite

Text

Paul et al. "A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis." International Conference on Learning Representations, 2024.

Markdown

[Paul et al. "A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/paul2024iclr-simple/)

BibTeX

@inproceedings{paul2024iclr-simple,
  title     = {{A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis}},
  author    = {Paul, Dipanjyoti and Chowdhury, Arpita and Xiong, Xinqi and Chang, Feng-Ju and Carlyn, David Edward and Stevens, Samuel and Provost, Kaiya L and Karpatne, Anuj and Carstens, Bryan and Rubenstein, Daniel and Stewart, Charles and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/paul2024iclr-simple/}
}