Object-Centric Representations Generalize Better Compositionally with Less Compute

Abstract

Compositional generalization—the ability to reason about novel combinations of familiar concepts—is fundamental to human cognition and a critical challenge for machine learning. Object-Centric representation learning has been proposed as a promising approach for achieving this capability. However, systematic evaluation of these methods in visually complex settings remains limited. In this work, we introduce a benchmark to measure how well vision encoders, with and without object-centric biases, generalize to unseen combinations of object properties. Using CLEVRTex-style images, we create multiple training splits with partial coverage of object property combinations and generate question--answer pairs to assess compositional generalization on a held-out test set. We focus on comparing pretrained foundation models with object-centric models that incorporate such foundation models as backbones---a leading approach in this domain. To ensure a fair and comprehensive comparison, we carefully account for representation format differences. In this preliminary study, we use DINOv2 as the foundation model and DINOSAURv2 as its object-centric counterpart. We control for compute budget and differences in image representation sizes to ensure robustness. Our key findings reveal that object-centric approaches (1) converge faster on in-distribution data but underperform slightly when non-object-centric models are given a significant compute advantage, and (2) they exhibit superior compositional generalization, outperforming DINOv2 on unseen combinations of object properties while requiring approximately four to eight times less downstream compute.

Cite

Text

Kapl et al. "Object-Centric Representations Generalize Better Compositionally with Less Compute." ICLR 2025 Workshops: SCSL, 2025.

Markdown

[Kapl et al. "Object-Centric Representations Generalize Better Compositionally with Less Compute." ICLR 2025 Workshops: SCSL, 2025.](https://mlanthology.org/iclrw/2025/kapl2025iclrw-objectcentric/)

BibTeX

@inproceedings{kapl2025iclrw-objectcentric,
  title     = {{Object-Centric Representations Generalize Better Compositionally with Less Compute}},
  author    = {Kapl, Ferdinand and Mamaghan, Amir Mohammad Karimi and Horn, Max and Marr, Carsten and Bauer, Stefan and Dittadi, Andrea},
  booktitle = {ICLR 2025 Workshops: SCSL},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/kapl2025iclrw-objectcentric/}
}