Compositional Visual Reasoning with SlotSSMs
Abstract
In many real-world sequence modeling problems, the underlying process is inherently modular and it is important to design machine learning architectures that can leverage this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating independent mechanisms into State Space Models (SSMs), such as Mamba, to preserve or encourage separation of information, thereby improving visual reasoning. We evaluate SlotSSMs on long-sequence reasoning and real-world depth estimation tasks, demonstrating substantial performance improvements over existing sequence modeling methods. Our design efficiently exploits the modularity of inputs and scales effectively through the parallelizable architecture enabled by SSMs. We hope this approach will inspire future research on compositional reasoning architectures.
Cite
Text
Jiang et al. "Compositional Visual Reasoning with SlotSSMs." NeurIPS 2024 Workshops: Compositional_Learning, 2024.Markdown
[Jiang et al. "Compositional Visual Reasoning with SlotSSMs." NeurIPS 2024 Workshops: Compositional_Learning, 2024.](https://mlanthology.org/neuripsw/2024/jiang2024neuripsw-compositional/)BibTeX
@inproceedings{jiang2024neuripsw-compositional,
title = {{Compositional Visual Reasoning with SlotSSMs}},
author = {Jiang, Jindong and Deng, Fei and Singh, Gautam and Lee, Minseung and Ahn, Sungjin},
booktitle = {NeurIPS 2024 Workshops: Compositional_Learning},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/jiang2024neuripsw-compositional/}
}