Natural Language Reinforcement Learning

Feng, Xidong; Liu, Bo; Wan, Ziyu; Fu, Haotian; Koushik, Girish A.; Hu, Zhiyuan; Yang, Mengyue; Wen, Ying; Wang, Jun

Natural Language Reinforcement Learning

Xidong Feng, Bo Liu, Ziyu Wan, Haotian Fu, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang

ICLRW 2025

/iclrw/2025/feng2025iclrw-natural/

Abstract

Reinforcement Learning (RL) mathematically formulates decision-making with Markov Decision Process (MDP). With MDPs, researchers have achieved remarkable breakthroughs across various domains, including games, robotics, and language models. This paper seeks a new possibility, Natural Language Reinforcement Learning (NLRL), by extending traditional MDP to natural language-based representation space. Specifically, NLRL innovatively redefines RL principles, including task objectives, policy, value function, Bellman equation, and policy iteration, into their language counterparts. With recent advancements in large language models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value improvement by either pure prompting or gradient-based training. Experiments over Maze, Breakthrough, and Tic-Tac-Toe games demonstrate the effectiveness, efficiency, and interpretability of the NLRL framework among diverse use cases.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Feng et al. "Natural Language Reinforcement Learning." ICLR 2025 Workshops: SSI-FM, 2025.

Markdown

[Feng et al. "Natural Language Reinforcement Learning." ICLR 2025 Workshops: SSI-FM, 2025.](https://mlanthology.org/iclrw/2025/feng2025iclrw-natural/)

BibTeX

@inproceedings{feng2025iclrw-natural,
  title     = {{Natural Language Reinforcement Learning}},
  author    = {Feng, Xidong and Liu, Bo and Wan, Ziyu and Fu, Haotian and Koushik, Girish A. and Hu, Zhiyuan and Yang, Mengyue and Wen, Ying and Wang, Jun},
  booktitle = {ICLR 2025 Workshops: SSI-FM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/feng2025iclrw-natural/}
}