Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning

Abstract

We propose the Compact Clustering Attention (COCA) layer, an effective building block that introduces a hierarchical strategy for object-centric representation learning, while solving the unsupervised object discovery task on single images. COCA is an attention-based clustering module capable of extracting object-centric representations from multi-object scenes, when cascaded into a bottom-up hierarchical network architecture, referred to as COCA-Net. At its core, COCA utilizes a novel clustering algorithm that leverages the physical concept of compactness, to highlight distinct object centroids in a scene, providing a spatial inductive bias. Thanks to this strategy, COCA-Net generates high-quality segmentation masks on both the decoder side and, notably, the encoder side of its pipeline. Additionally, COCA-Net is not bound by a predetermined number of object masks that it generates and handles the segmentation of background elements better than its competitors. We demonstrate COCA-Net's segmentation performance on six widely adopted datasets, achieving superior or competitive results against the state-of-the-art models across nine different evaluation metrics.

Cite

Text

Kucuksozen and Yemez. "Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02364

Markdown

[Kucuksozen and Yemez. "Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/kucuksozen2025cvpr-hierarchical/) doi:10.1109/CVPR52734.2025.02364

BibTeX

@inproceedings{kucuksozen2025cvpr-hierarchical,
  title     = {{Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning}},
  author    = {Kucuksozen, Can and Yemez, Yucel},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {25388-25398},
  doi       = {10.1109/CVPR52734.2025.02364},
  url       = {https://mlanthology.org/cvpr/2025/kucuksozen2025cvpr-hierarchical/}
}