The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Abstract

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM RL with the temporally extended Partially Observable Markov Decision Processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

Cite

Text

Zhang et al. "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey." Transactions on Machine Learning Research, 2026.

Markdown

[Zhang et al. "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/zhang2026tmlr-landscape/)

BibTeX

@article{zhang2026tmlr-landscape,
  title     = {{The Landscape of Agentic Reinforcement Learning for LLMs: A Survey}},
  author    = {Zhang, Guibin and Geng, Hejia and Yu, Xiaohang and Yin, Zhenfei and Zhang, Zaibin and Tan, Zelin and Zhou, Heng and Li, Zhong-Zhi and Xue, Xiangyuan and Li, Yijiang and Zhou, Yifan and Chen, Yang and Zhang, Chen and Fan, Yutao and Wang, Zihu and Huang, Songtao and Velez, Francisco Piedrahita and Liao, Yue and Wang, Hongru and Yang, Mengyue and Ji, Heng and Wang, Jun and Yan, Shuicheng and Torr, Philip and Bai, Lei},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/zhang2026tmlr-landscape/}
}