A Bilevel Reinforcement Learning Framework with Language Prior Knowledge

Abstract

Large language models (LLMs) demonstrate their promise in tackling complicated practical challenges by combining action-based policies with chain of thought (CoT) reasoning. Having high-quality prompts on hand, however, is vital to the framework’s effectiveness. Currently, these prompts are handcrafted utilising extensive human labor, resulting in CoT policies that frequently fail to generalise. Human intervention is also required to develop grounding functions that ensure low-level controllers appropriately process CoT reasoning. In this paper, we propose a comprehensive end-to-end training framework for complex task-solving that utilises language prior knowledge embedded within LLMs or from human experts. To that purpose, we offer a new leader-follower reinforcement learning framework that incorporates a prompt policy, a CoT process, and an action policy. The prompt policy is employed to ask pertinent questions based on historical observations, leading the CoT process to consider the anticipated goals and generate state-adaptive thoughts that lead to decisive, high-performing actions. To induce these high-quality actions, the prompt policy has its own objective in our system, encouraging it to adapt to the behavior of the action policy. The action policy subsequently learns to comprehend and integrate the CoT outputs to take precise actions. Empirical results demonstrate that our framework outperforms leading methods in 6 popular decision-making benchmark environments, including Overcooked and ALFWorld.

Cite

Text

Yan et al. "A Bilevel Reinforcement Learning Framework with Language Prior Knowledge." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06106-5_13

Markdown

[Yan et al. "A Bilevel Reinforcement Learning Framework with Language Prior Knowledge." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/yan2025ecmlpkdd-bilevel/) doi:10.1007/978-3-032-06106-5_13

BibTeX

@inproceedings{yan2025ecmlpkdd-bilevel,
  title     = {{A Bilevel Reinforcement Learning Framework with Language Prior Knowledge}},
  author    = {Yan, Xue and Song, Yan and Cui, Xinyu and Christianos, Filippos and Zhang, Haifeng and Wang, Jun and Mguni, David},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {216-232},
  doi       = {10.1007/978-3-032-06106-5_13},
  url       = {https://mlanthology.org/ecmlpkdd/2025/yan2025ecmlpkdd-bilevel/}
}