Prompt Curriculum Learning for Efficient LLM Post-Training

Abstract

Reinforcement learning (RL) is widely used to post-train large language models for tasks such as mathematical reasoning and coding. However, the convergence of RL training remains sensitive to batching and prompt selection strategies. We investigate the factors that affect convergence, including batch size and prompt difficulty. Through large-scale experiments across multiple models and datasets, we show that there exists an optimal batch size that balances generation time and gradient quality, and that prompts of intermediate difficulty (where the model has roughly a 50\% chance of success) are the most sample-efficient for model convergence. Motivated by these findings, we propose Prompt Curriculum Learning (PCL), a lightweight algorithm that selects intermediate-difficulty prompts using a learned value model. PCL avoids costly rollouts and efficiently guides training by focusing on the most informative samples. Empirically, PCL either achieves the highest performance or requires significantly less training time to reach comparable performance across a suite of benchmarks. Compared to using rollouts to filter, PCL is $12.1\times$ and $16.9\times$ faster on identifying intermediate-difficulty prompts when training on MATH and DeepScaleR respectively.

Cite

Text

Gao et al. "Prompt Curriculum Learning for Efficient LLM Post-Training." International Conference on Learning Representations, 2026.

Markdown

[Gao et al. "Prompt Curriculum Learning for Efficient LLM Post-Training." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/gao2026iclr-prompt/)

BibTeX

@inproceedings{gao2026iclr-prompt,
  title     = {{Prompt Curriculum Learning for Efficient LLM Post-Training}},
  author    = {Gao, Zhaolin and Kim, Joongwon and Sun, Wen and Joachims, Thorsten and Wang, Sid and Pang, Richard Yuanzhe and Tan, Liang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/gao2026iclr-prompt/}
}