Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Zheng, Tong; Zhang, Hongming; Yu, Wenhao; Wang, Xiaoyang; Xing, He; Dai, Runpeng; Liu, Rui; Bao, Huiwen; Huang, Chengsong; Huang, Heng; Yu, Dong

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Tong Zheng, Hongming Zhang, Wenhao Yu, Xiaoyang Wang, He Xing, Runpeng Dai, Rui Liu, Huiwen Bao, Chengsong Huang, Heng Huang, Dong Yu

ICLR 2026

/iclr/2026/zheng2026iclr-parallelr1/

Abstract

Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging. Existing methods mainly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced learning rather than exploration and generalization. To address this issue, we propose **Parallel-R1**, the first reinforcement learning (RL) framework that instills parallel thinking for complex real-world reasoning tasks. Our framework employs a progressive curriculum that addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking behavior, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully elicits parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on difficult tasks with RL. Further analysis reveals a distinct shift in the model's thinking patterns: in the early stage, it utilizes parallel thinking as an exploration strategy, while in the later stage, it employs this ability for multi-perspective verification. Most significantly, we validate parallel thinking as a **mid-training exploration scaffold**, where this intermediate phase unlocks a higher performance ceiling after RL, yielding a **42.9%** improvement over the sequential RL baseline.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zheng et al. "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Zheng et al. "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zheng2026iclr-parallelr1/)

BibTeX

@inproceedings{zheng2026iclr-parallelr1,
  title     = {{Parallel-R1: Towards Parallel Thinking via Reinforcement Learning}},
  author    = {Zheng, Tong and Zhang, Hongming and Yu, Wenhao and Wang, Xiaoyang and Xing, He and Dai, Runpeng and Liu, Rui and Bao, Huiwen and Huang, Chengsong and Huang, Heng and Yu, Dong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zheng2026iclr-parallelr1/}
}