TrimR: Verifier-Based Training-Free Thinking Trimming for Efficient Test-Time Scaling

Lin, Weizhe; Li, Xing; Yang, Zhiyuan; Fu, Xiaojin; Zhen, Hui-Ling; Wang, Yaoyuan; Yu, Xianzhi; Liu, Wulong; Li, Xiaosong; Yuan, Mingxuan

TrimR: Verifier-Based Training-Free Thinking Trimming for Efficient Test-Time Scaling

Weizhe Lin, Xing Li, Zhiyuan Yang, Xiaojin Fu, Hui-Ling Zhen, Yaoyuan Wang, Xianzhi Yu, Wulong Liu, Xiaosong Li, Mingxuan Yuan

ICLR 2026

/iclr/2026/lin2026iclr-trimr/

Abstract

Large Reasoning Models (LRMs) demonstrate exceptional capability in tackling complex mathematical, logical, and coding tasks by leveraging extended Chain-of-Thought (CoT) reasoning. Test-time scaling methods—such as prolonging CoT with explicit token-level exploration—can push LRMs’ accuracy boundaries, but they incur significant decoding overhead. A key inefficiency source is LRMs often generate redundant thinking CoTs, which demonstrate clear structured overthinking and underthinking patterns. Inspired by human cognitive reasoning processes and numerical optimization theories, we propose TrimR, a verifier-based, training-free, efficient framework to trim reasoning and enhance test-time scaling, explicitly tailored for production-level deployment. Our method employs a lightweight, pretrained, instruction-tuned verifier to detect and truncate redundant intermediate thoughts of LRMs without any LRM or verifier fine-tuning. We present both the core algorithm and asynchronous online system engineered for high-throughput industrial applications. Empirical evaluations on Ascend NPUs and vLLM show that our framework delivers substantial gains in inference efficiency under large-batch workloads. In particular, on the four MATH500, AIME24/25, and GPQA benchmarks, the reasoning runtime of QwQ-32B, DeepSeek-R1-Distill-Qwen-32B, and Pangu-R-38B is improved by up to 70% with negligible impact on accuracy.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Lin et al. "TrimR: Verifier-Based Training-Free Thinking Trimming for Efficient Test-Time Scaling." International Conference on Learning Representations, 2026.

Markdown

[Lin et al. "TrimR: Verifier-Based Training-Free Thinking Trimming for Efficient Test-Time Scaling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lin2026iclr-trimr/)

BibTeX

@inproceedings{lin2026iclr-trimr,
  title     = {{TrimR: Verifier-Based Training-Free Thinking Trimming for Efficient Test-Time Scaling}},
  author    = {Lin, Weizhe and Li, Xing and Yang, Zhiyuan and Fu, Xiaojin and Zhen, Hui-Ling and Wang, Yaoyuan and Yu, Xianzhi and Liu, Wulong and Li, Xiaosong and Yuan, Mingxuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lin2026iclr-trimr/}
}