Length Independent Generalization Bounds for Deep SSM Architectures via Rademacher Contraction and Stability Constraints
Abstract
Deep SSM models like S4, S5, and LRU are made of sequential blocks that combine State-Space Model (SSM) layers with neural networks, achieving excellent performance on learning representations of long-range sequences. In this paper we provide a PAC bound on the generalization error of non-selective architectures with stable SSM blocks, that does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
Cite
Text
Rácz et al. "Length Independent Generalization Bounds for Deep SSM Architectures via Rademacher Contraction and Stability Constraints." Transactions on Machine Learning Research, 2025.Markdown
[Rácz et al. "Length Independent Generalization Bounds for Deep SSM Architectures via Rademacher Contraction and Stability Constraints." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/racz2025tmlr-length/)BibTeX
@article{racz2025tmlr-length,
title = {{Length Independent Generalization Bounds for Deep SSM Architectures via Rademacher Contraction and Stability Constraints}},
author = {Rácz, Dániel and Petreczky, Mihaly and Daroczy, Balint},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/racz2025tmlr-length/}
}