Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning

Abstract

Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs), yet underlying mechanisms driving this success remain largely opaque. Our analysis reveals that puzzling phenomena like ``aha moments", ``length-scaling'' and entropy dynamics are not disparate occurrences but hallmarks of an emergent reasoning hierarchy, akin to the separation of high-level strategic planning from low-level procedural execution in human cognition. We uncover a compelling two-phase dynamic: initially, a model is constrained by procedural correctness and must improve its low-level skills. The learning bottleneck then decisively shifts, with performance gains being driven by the exploration and mastery of high-level strategic planning. This insight exposes a core inefficiency in prevailing RL algorithms like GRPO, which apply optimization pressure agnostically and dilute the learning signal across all tokens. To address this, we propose Hierarchy-Aware Credit Assignment (HICRA), an algorithm that concentrates optimization efforts on high-impact planning tokens. Our extensive experiments validate that HICRA significantly outperforms strong baselines, and offer deep insights into how reasoning advances through the lens of strategic exploration.

Cite

Text

Wang et al. "Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-emergent/)

BibTeX

@inproceedings{wang2026iclr-emergent,
  title     = {{Emergent Hierarchical Reasoning in LLMs Through Reinforcement Learning}},
  author    = {Wang, Haozhe and Xu, Qixin and Liu, Che and Wu, Junhong and Lin, Fangzhen and Chen, Wenhu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-emergent/}
}