Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Abstract
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We train a proof-of-concept model from scratch with 3.5 billion parameters and 800 billion tokens. We show that this model can effortlessly use varying levels of compute, significantly improving with additional compute especially on reasoning tasks, such as math and coding. Further, this architecture naturally reduces compute costs via zero-shot per-token adaptive compute, KV-cache sharing and speculative decoding.
Cite
Text
Geiping et al. "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." Advances in Neural Information Processing Systems, 2025.Markdown
[Geiping et al. "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/geiping2025neurips-scaling/)BibTeX
@inproceedings{geiping2025neurips-scaling,
title = {{Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach}},
author = {Geiping, Jonas and McLeish, Sean Michael and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/geiping2025neurips-scaling/}
}