Beyond Decodability: Linear Feature Spaces Enable Visual Compositional Generalization

Abstract

While compositional generalization is fundamental to human intelligence, we still lack understanding of how neural networks combine learned representations of parts into novel wholes. We investigate whether neural networks express representations as linear sums of simpler constituent parts. Our analysis reveals that models trained from scratch often exhibit decodability, where the features can be linearly decoded to perform well, but may lack linear structure, preventing the models from generalizing zero-shot. Instead, linearity of representations only arises with high training data diversity. We prove that when representations are linear, perfect generalization to novel concept combinations is possible with minimal training data. Empirically evaluating large-scale pretrained models through this lens reveals that they achieve strong generalization for certain concept types while still falling short of the ideal linear structure for others.

Cite

Text

Uselis et al. "Beyond Decodability: Linear Feature Spaces Enable Visual Compositional Generalization." ICLR 2025 Workshops: SCSL, 2025.

Markdown

[Uselis et al. "Beyond Decodability: Linear Feature Spaces Enable Visual Compositional Generalization." ICLR 2025 Workshops: SCSL, 2025.](https://mlanthology.org/iclrw/2025/uselis2025iclrw-beyond/)

BibTeX

@inproceedings{uselis2025iclrw-beyond,
  title     = {{Beyond Decodability: Linear Feature Spaces Enable Visual Compositional Generalization}},
  author    = {Uselis, Arnas and Dittadi, Andrea and Oh, Seong Joon},
  booktitle = {ICLR 2025 Workshops: SCSL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/uselis2025iclrw-beyond/}
}