Revisiting Reinforcement Learning for LLM Reasoning from a Cross-Domain Perspective

Abstract

Reinforcement learning (RL) has shown promise in enhancing large language model (LLM) reasoning, yet progress towards broader capabilities is limited by the availability of high-quality, multi-domain datasets. This work introduces \ours, a 92K RL-for-reasoning dataset designed to address this gap, covering six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular, each with corresponding verifiers. We build \ours via a careful data-curation pipeline, including sourcing, deduplication, reward design, and domain-specific and difficulty-based filtering, to facilitate the systematic investigation of cross-domain RL generalization. Our study using \ours suggests the efficacy of a simple mixed-domain RL training approach and reveals several key aspects affecting cross-domain transferability. We further train two models {\ours}-7B and {\ours}-32B purely with RL on our curated data and observe largely improved performance over leading open RL reasoning model baselines, with gains of 7.3\% and 7.8\% respectively on an extensive 17-task, six-domain evaluation suite. We are releasing our dataset, code, and evaluation suite to the community, aiming to support further research and development of more general RL-enhanced reasoning models.

Cite

Text

Cheng et al. "Revisiting Reinforcement Learning for LLM Reasoning from a Cross-Domain Perspective." Advances in Neural Information Processing Systems, 2025.

Markdown

[Cheng et al. "Revisiting Reinforcement Learning for LLM Reasoning from a Cross-Domain Perspective." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/cheng2025neurips-revisiting/)

BibTeX

@inproceedings{cheng2025neurips-revisiting,
  title     = {{Revisiting Reinforcement Learning for LLM Reasoning from a Cross-Domain Perspective}},
  author    = {Cheng, Zhoujun and Hao, Shibo and Liu, Tianyang and Zhou, Fan and Xie, Yutao and Yao, Feng and Bian, Yuexin and Dey, Nilabjo and Zhuang, Yonghao and Zha, Yuheng and Gu, Yi and Zhou, Kun and Wang, Yuqi and Li, Yuan and Fan, Richard and She, Jianshu and Gao, Chengqian and Saparov, Abulhair and Killian, Taylor W. and Li, Haonan and Yurochkin, Mikhail and Xing, Eric P. and Liu, Zhengzhong and Hu, Zhiting},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/cheng2025neurips-revisiting/}
}