Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters

Abstract

Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, have recently been recognized as potential alternatives to softmax attention thanks to their linear complexity and competitive performance. However, although their linear-memory advantage during training enables dealing with long sequences, it is still hard to handle extremely long sequences with very limited computational resources. In this paper, we propose Sequence Accumulation (SA) which leverages the common recurrence feature of linear sequence modeling methods to manage infinite context length even on a single GPU. Specifically, SA divides long input sequences into fixed-length sub-sequences and accumulates intermediate states sequentially, which reaches only constant-memory consumption. Additionally, we further propose Sequence Accumulation with Pipeline Parallelism (SAPP), to train large models with infinite context length, without incurring any additional synchronization costs in the sequence dimension. Extensive experiments with a wide range of context lengths are conducted to validate the effectiveness of SA and SAPP on both single and multiple GPUs. Results show that SA and SAPP enable the training of infinite context length on even very limited resources, and are well compatible with the out-of-the-box distributed training techniques.

Cite

Text

Sun et al. "Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I19.34284

Markdown

[Sun et al. "Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/sun2025aaai-sequence/) doi:10.1609/AAAI.V39I19.34284

BibTeX

@inproceedings{sun2025aaai-sequence,
  title     = {{Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters}},
  author    = {Sun, Weigao and Liu, Yongtuo and Tang, Xiaqiang and Mo, Xiaoyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {20725-20733},
  doi       = {10.1609/AAAI.V39I19.34284},
  url       = {https://mlanthology.org/aaai/2025/sun2025aaai-sequence/}
}