Parallelized Spatiotemporal Slot Binding for Videos
Abstract
While modern best practices advocate for scalable architectures that support long-range interactions, object-centric models are yet to fully embrace these architectures. In particular, existing object-centric models for handling sequential inputs, due to their reliance on RNN-based implementation, show poor stability and capacity and are slow to train on long sequences. We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. This is achieved by refining the initial slots across all time-steps through a fixed number of layers equipped with causal attention. By capitalizing on the parallelism induced by our architecture, the proposed model exhibits a significant boost in efficiency. In experiments, we test PSB extensively as an encoder within an auto-encoding framework paired with a wide variety of decoder options. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.
Cite
Text
Singh et al. "Parallelized Spatiotemporal Slot Binding for Videos." International Conference on Machine Learning, 2024.Markdown
[Singh et al. "Parallelized Spatiotemporal Slot Binding for Videos." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/singh2024icml-parallelized/)BibTeX
@inproceedings{singh2024icml-parallelized,
title = {{Parallelized Spatiotemporal Slot Binding for Videos}},
author = {Singh, Gautam and Wang, Yue and Yang, Jiawei and Ivanovic, Boris and Ahn, Sungjin and Pavone, Marco and Che, Tong},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {45707-45733},
volume = {235},
url = {https://mlanthology.org/icml/2024/singh2024icml-parallelized/}
}