ConvT3: Structured State Kernels for Convolutional State Space Models

Abstract

Modeling long spatiotemporal sequences requires capturing both complex spatial correlations and temporal dependencies. Convolutional State Space Models (ConvSSMs) have been proposed to incorporate spatial modeling in State Space Models (SSMs) using the convolution of tensor-valued states and kernels. Yet, existing implementations remain limited to $1\times 1$ state kernels for computational feasibility, which limits the modeling capacity of ConvSSMs. We introduce a novel spatiotemporal model, ConvT3 (ConvSSM using Tridiagonal Toeplitz Tensors), designed to equivalently realize ConvSSMs with extended $3\times 3$ state kernels. ConvT3 structures a state kernel for its corresponding tensor to be composed as a structured SSM matrix on hidden state dimensions and a constrained tridiagonal Toeplitz tensor on spatial dimensions. We show that the structured tensor can be diagonalized, which enables efficient parallel training while leveraging $3\times 3$ state convolutions. We demonstrate that ConvT3 effectively embeds rich spatial and temporal information into the dynamics of tensor-valued states, achieving state-of-the-art performance on most metrics in long-range video generation and physical system modeling.

Cite

Text

Hong et al. "ConvT3: Structured State Kernels for Convolutional State Space Models." International Conference on Learning Representations, 2026.

Markdown

[Hong et al. "ConvT3: Structured State Kernels for Convolutional State Space Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hong2026iclr-convt3/)

BibTeX

@inproceedings{hong2026iclr-convt3,
  title     = {{ConvT3: Structured State Kernels for Convolutional State Space Models}},
  author    = {Hong, Jaeyoung and Choi, Yun Young and Ko, Joohwan and Gwak, Minseon},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hong2026iclr-convt3/}
}