DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Yu, Qiying; Zhang, Zheng; Zhu, Ruofei; Yuan, Yufeng; Zuo, Xiaochen; YuYue,; Dai, Weinan; Fan, Tiantian; Liu, Gaohong; Liu, Juncai; Liu, LingJun; Liu, Xin; Lin, Haibin; Lin, Zhiqi; Ma, Bole; Sheng, Guangming; Tong, Yuxuan; Zhang, Chi; Zhang, Mofan; Zhang, Ru; Zhang, Wang; Zhu, Hang; Zhu, Jinhua; Chen, Jiaze; Chen, Jiangjie; Wang, Chengyi; Yu, Hongli; Song, Yuxuan; Wei, Xiangpeng; Zhou, Hao; Liu, Jingjing; Ma, Wei-Ying; Zhang, Ya-Qin; Yan, Lin; Wu, Yonghui; Wang, Mingxuan

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

NeurIPS 2025

/neurips/2025/yu2025neurips-dapo/

Abstract

Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the **D**ecoupled Clip and **D**ynamic s**A**mpling **P**olicy **O**ptimization (**DAPO**) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2.5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-source system enhance reproducibility and support future research in large-scale LLM RL.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Yu et al. "DAPO: An Open-Source LLM Reinforcement Learning System at Scale." Advances in Neural Information Processing Systems, 2025.

Markdown

[Yu et al. "DAPO: An Open-Source LLM Reinforcement Learning System at Scale." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yu2025neurips-dapo/)

BibTeX

@inproceedings{yu2025neurips-dapo,
  title     = {{DAPO: An Open-Source LLM Reinforcement Learning System at Scale}},
  author    = {Yu, Qiying and Zhang, Zheng and Zhu, Ruofei and Yuan, Yufeng and Zuo, Xiaochen and YuYue,  and Dai, Weinan and Fan, Tiantian and Liu, Gaohong and Liu, Juncai and Liu, LingJun and Liu, Xin and Lin, Haibin and Lin, Zhiqi and Ma, Bole and Sheng, Guangming and Tong, Yuxuan and Zhang, Chi and Zhang, Mofan and Zhang, Ru and Zhang, Wang and Zhu, Hang and Zhu, Jinhua and Chen, Jiaze and Chen, Jiangjie and Wang, Chengyi and Yu, Hongli and Song, Yuxuan and Wei, Xiangpeng and Zhou, Hao and Liu, Jingjing and Ma, Wei-Ying and Zhang, Ya-Qin and Yan, Lin and Wu, Yonghui and Wang, Mingxuan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/yu2025neurips-dapo/}
}