SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Abstract

Unsupervised object-centric learning aims to decompose scenes into interpretable object entities termed slots. Slot-based auto-encoders stand out as a prominent method for this task. Within them crucial aspects include guiding the encoder to generate object-specific slots and ensuring the decoder utilizes them during reconstruction. This work introduces two novel techniques (i) an attention-based self-training approach which distills superior slot-based attention masks from the decoder to the encoder enhancing object segmentation and (ii) an innovative patch-order permutation strategy for autoregressive transformers that strengthens the role of slot vectors in reconstruction. The effectiveness of these strategies is showcased experimentally. The combined approach significantly surpasses prior slot-based autoencoder methods in unsupervised object segmentation especially with complex real-world images. We provide the implementation code at https://github.com/gkakogeorgiou/spot .

Cite

Text

Kakogeorgiou et al. "SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02149

Markdown

[Kakogeorgiou et al. "SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/kakogeorgiou2024cvpr-spot/) doi:10.1109/CVPR52733.2024.02149

BibTeX

@inproceedings{kakogeorgiou2024cvpr-spot,
  title     = {{SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers}},
  author    = {Kakogeorgiou, Ioannis and Gidaris, Spyros and Karantzalos, Konstantinos and Komodakis, Nikos},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {22776-22786},
  doi       = {10.1109/CVPR52733.2024.02149},
  url       = {https://mlanthology.org/cvpr/2024/kakogeorgiou2024cvpr-spot/}
}