Parallelized Spatiotemporal Slot Binding for Videos

Abstract

While modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.

Cite

Text

Singh et al. "Parallelized Spatiotemporal Slot Binding for Videos." International Conference on Machine Learning, 2024.

Markdown

[Singh et al. "Parallelized Spatiotemporal Slot Binding for Videos." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/singh2024icml-parallelized/)

BibTeX

@inproceedings{singh2024icml-parallelized,
  title     = {{Parallelized Spatiotemporal Slot Binding for Videos}},
  author    = {Singh, Gautam and Wang, Yue and Yang, Jiawei and Ivanovic, Boris and Ahn, Sungjin and Pavone, Marco and Che, Tong},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {45707-45733},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/singh2024icml-parallelized/}
}