TNT: Improving Chunkwise Training for Test-Time Memorization

Li, Zeman; Behrouz, Ali; Deng, Yuan; Zhong, Peilin; Kacham, Praneeth; Karami, Mahdi; Razaviyayn, Meisam; Mirrokni, Vahab

TNT: Improving Chunkwise Training for Test-Time Memorization

Zeman Li, Ali Behrouz, Yuan Deng, Peilin Zhong, Praneeth Kacham, Mahdi Karami, Meisam Razaviyayn, Vahab Mirrokni

ICLR 2026

/iclr/2026/li2026iclr-tnt/

Abstract

Recurrent neural networks (RNNs) with deep test-time memorization modules, such as Titans and TTT, represent a promising, linearly-scaling paradigm distinct from Transformers. While these expressive models do not yet match the peak performance of state-of-the-art Transformers, their potential has been largely untapped due to prohibitively slow training and low hardware utilization. Existing parallelization methods force a fundamental conflict governed by the chunksize hyperparameter: large chunks boost speed but degrade performance, necessitating a fixed, suboptimal compromise. To solve this challenge, we introduce TNT, a novel training paradigm that decouples training efficiency from inference performance through a two-stage process. Stage one is an efficiency-focused pre-training phase utilizing a hierarchical memory. A global module processes large, hardware-friendly chunks for long-range context, while multiple parallel local modules handle fine-grained details. Crucially, by periodically resetting local memory states, we break sequential dependencies to enable massive context parallelization. Stage two is a brief fine-tuning phase where only the local memory modules are adapted to a smaller, high-resolution chunksize, maximizing accuracy with minimal overhead. Evaluated on Titans and TTT models, TNT achieves a substantial acceleration in training speed—up to 17$\times$ faster than the most accurate baseline configuration—while simultaneously improving model accuracy. This improvement removes a critical scalability barrier, establishing a practical foundation for developing expressive RNNs and facilitating future work to close the performance gap with Transformers.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "TNT: Improving Chunkwise Training for Test-Time Memorization." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "TNT: Improving Chunkwise Training for Test-Time Memorization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-tnt/)

BibTeX

@inproceedings{li2026iclr-tnt,
  title     = {{TNT: Improving Chunkwise Training for Test-Time Memorization}},
  author    = {Li, Zeman and Behrouz, Ali and Deng, Yuan and Zhong, Peilin and Kacham, Praneeth and Karami, Mahdi and Razaviyayn, Meisam and Mirrokni, Vahab},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-tnt/}
}