On the Information Geometry of Vision Transformers

Abstract

Understanding the structure of high-dimensional representations learned by Vision Transformers (ViTs) provides a pathway toward developing a mechanistic understanding and further improving architecture design. In this work, we leverage tools from information geometry to characterize representation quality at a per-token (intra-token) level as well as across pairs of tokens (inter-token) in ViTs pretrained for object classification. In particular, we observe that these high-dimensional tokens exhibit a characteristic spectral decay in the feature covariance matrix. By measuring the rate of this decay (denoted by $\alpha$) for each token across transformer blocks, we discover an $\alpha$ signature, indicative of a transition from lower to higher effective dimensionality. We also demonstrate that tokens can be clustered based on their $\alpha$ signature, revealing that tokens corresponding to nearby spatial patches of the original image exhibit similar $\alpha$ trajectories. Furthermore, for measuring the complexity at the sequence level, we aggregate the correlation between pairs of tokens independently at each transformer block. A higher average correlation indicates a significant overlap between token representations and lower effective complexity. Notably, we observe a U-shaped trend across the model hierarchy, suggesting that token representations are more expressive in the intermediate blocks. Our findings provide a framework for understanding information processing in ViTs while providing tools to prune/merge tokens across blocks, thereby making the architectures more efficient.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Joseph et al. "On the Information Geometry of Vision Transformers." NeurIPS 2023 Workshops: NeurReps, 2023.

Markdown

[Joseph et al. "On the Information Geometry of Vision Transformers." NeurIPS 2023 Workshops: NeurReps, 2023.](https://mlanthology.org/neuripsw/2023/joseph2023neuripsw-information/)

BibTeX

@inproceedings{joseph2023neuripsw-information,
  title     = {{On the Information Geometry of Vision Transformers}},
  author    = {Joseph, Sonia and Agrawal, Kumar Krishna and Ghosh, Arna and Richards, Blake Aaron},
  booktitle = {NeurIPS 2023 Workshops: NeurReps},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/joseph2023neuripsw-information/}
}