Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Cen, Zhepeng; Chen, Haolin; Wang, Shiyu; Liu, Zuxin; Liu, Zhiwei; Zhao, Ding; Xiong, Caiming; Wang, Huan; Yao, Weiran

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Zhepeng Cen, Haolin Chen, Shiyu Wang, Zuxin Liu, Zhiwei Liu, Ding Zhao, Caiming Xiong, Huan Wang, Weiran Yao

ICLR 2026

/iclr/2026/cen2026iclr-webscalerl/

Abstract

Large Language Models (LLMs) have achieved remarkable success through imitation learning on vast text corpora, but this paradigm creates a training-generation gap and limits robust reasoning. Reinforcement learning (RL) offers a more data-efficient solution capable of bridging this gap, yet its application has been constrained by a critical data bottleneck: existing RL datasets are orders of magnitude smaller and less diverse than web-scale pre-training corpora. To address this, we introduce the \textbf{\texttt{Webscale-RL} pipeline}, a scalable data engine that systematically converts large-scale pre-training documents into millions of diverse, verifiable question-answer pairs for RL. Using this pipeline, we construct the \textbf{\texttt{Webscale-RL} dataset}, containing 1.2 million examples across more than 9 domains. Our experiments show that the model trained on this dataset significantly outperforms continual pretraining and strong data refinement baselines across a suite of benchmarks. Notably, RL training with our dataset proves substantially more efficient, achieving the performance of continual pre-training with up to 100$\times$ fewer tokens. Our work presents a viable path toward scaling RL to pre-training levels, enabling more capable and efficient language models.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Cen et al. "Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels." International Conference on Learning Representations, 2026.

Markdown

[Cen et al. "Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/cen2026iclr-webscalerl/)

BibTeX

@inproceedings{cen2026iclr-webscalerl,
  title     = {{Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels}},
  author    = {Cen, Zhepeng and Chen, Haolin and Wang, Shiyu and Liu, Zuxin and Liu, Zhiwei and Zhao, Ding and Xiong, Caiming and Wang, Huan and Yao, Weiran},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/cen2026iclr-webscalerl/}
}