MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
Abstract
Large Language Models have demonstrated superior reasoning capabilities by generating step-by-step reasoning in natural language before deriving the final answer. Recently, Geiping et al. introduced 3.5B-Huginn as an alternative to this paradigm, a depth-recurrent Transformer that increases computational depth per token by reusing a recurrent block in latent space. Despite its performance gains with increasing recurrences, this approach is inadequate for tasks demanding exploration and adaptivity, a limitation arising from its single, chain-like propagation mechanism. To address this, we propose a novel dynamic multi-branches routing approach for Huginn, termed as Mixture-of-Depth-Recurrent (MoDr) Transformer, which enables effective exploration of the solution space by shifting linear latent reasoning into a LoRA-based multi-branch dynamic relay mode with a learnable hard-gate routing. Meanwhile, we introduce an auxiliary-loss-free load balancing strategy to mitigate the potential routing collapse. Our empirical results reveal that MoDr achieves average accuracy improvements of +7.2% and +2.48% over the original Huginn model and its fine-tuned variant, respectively, across various mathematical reasoning benchmarks and improvements of +21.21% and +1.52% on commonsense reasoning benchmarks.
Cite
Text
Zhang et al. "MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning." International Conference on Learning Representations, 2026.Markdown
[Zhang et al. "MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-modr/)BibTeX
@inproceedings{zhang2026iclr-modr,
title = {{MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning}},
author = {Zhang, Xiaojing and Wu, Haifeng and He, Gang and Shen, Jiyang and Lyu, Bochen and Zhu, Zhanxing},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhang2026iclr-modr/}
}