ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling

Zhang, Haotian; Liu, Liu; Yu, Baosheng; Qiu, Jiayan; Xiao, Likang; Ren, Yanwei; Chen, Quan; Liu, Xianglong

ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling

Haotian Zhang, Liu Liu, Baosheng Yu, Jiayan Qiu, Likang Xiao, Yanwei Ren, Quan Chen, Xianglong Liu

ICLR 2026

/iclr/2026/zhang2026iclr-contextprm/

Abstract

Process reward models (PRMs) have demonstrated significant efficacy in enhancing the mathematical reasoning capabilities of large language models (LLMs) by leveraging test-time scaling (TTS). However, while most PRMs exhibit substantial gains in mathematical domains, the scarcity of domain-specific training data and knowledge-based learning patterns limits their generalization ability when faced with other domains. To address this limitation, we shift the learning objective from verifying domain-specific knowledge to modeling domain-agnostic logical flow. Centering on \textit{contextual coherence} between chain-of-thought (CoT) steps, our approach is realized through a novel data annotation and training framework, which enhances the model's generalization capabilities across diverse domains. For instance, our resulting model, \textbf{ContextPRM}, achieves a notable 6.5\% average accuracy improvement over the majority voting baseline via weighted majority voting across nine non-mathematical domains in MMLU-Pro, including law, history, and philosophy, significantly surpassing the 2.2\% improvement from VersaPRM and 0.5\% gains from other mathematics-focused PRMs, demonstrating consistent performance across both mathematical and non-mathematical domains.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-contextprm/)

BibTeX

@inproceedings{zhang2026iclr-contextprm,
  title     = {{ContextPRM: Leveraging Contextual Coherence for Multi-Domain Test-Time Scaling}},
  author    = {Zhang, Haotian and Liu, Liu and Yu, Baosheng and Qiu, Jiayan and Xiao, Likang and Ren, Yanwei and Chen, Quan and Liu, Xianglong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-contextprm/}
}