Reinforcing General Reasoning Without Verifiers

Zhou, Xiangxin; Liu, Zichen; Sims, Anya; Wang, Haonan; Pang, Tianyu; Li, Chongxuan; Wang, Liang; Lin, Min; Du, Chao

Reinforcing General Reasoning Without Verifiers

Xiangxin Zhou, Zichen Liu, Anya Sims, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du

ICLR 2026

/iclr/2026/zhou2026iclr-reinforcing/

Abstract

The recent paradigm shift towards training large language models (LLMs) using DeepSeek-R1-Zero-style reinforcement learning (RL) on verifiable rewards has led to impressive advancements in code and mathematical reasoning. However, this methodology is limited to tasks where rule-based answer verification is possible and does not naturally extend to real-world domains such as chemistry, healthcare, engineering, law, biology, business, and economics. Current practical workarounds use an additional LLM as a model-based verifier; however, this introduces issues such as reliance on a strong verifier LLM, susceptibility to reward hacking, and the practical burden of maintaining the verifier model in memory during training. To address this and extend DeepSeek-R1-Zero-style training to general reasoning domains, we propose a verifier-free method (**VeriFree**) that bypasses answer verification and instead directly maximizes the probability of generating the reference answer, derived in a principled way from the RL objective. We compare VeriFree with verifier-based methods and demonstrate that, in addition to its significant practical benefits and reduced compute requirements, VeriFree matches and even surpasses verifier-based methods on extensive evaluations across MMLU-Pro, GPQA, SuperGPQA, and math-related benchmarks.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhou et al. "Reinforcing General Reasoning Without Verifiers." International Conference on Learning Representations, 2026.

Markdown

[Zhou et al. "Reinforcing General Reasoning Without Verifiers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhou2026iclr-reinforcing/)

BibTeX

@inproceedings{zhou2026iclr-reinforcing,
  title     = {{Reinforcing General Reasoning Without Verifiers}},
  author    = {Zhou, Xiangxin and Liu, Zichen and Sims, Anya and Wang, Haonan and Pang, Tianyu and Li, Chongxuan and Wang, Liang and Lin, Min and Du, Chao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhou2026iclr-reinforcing/}
}