The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data

Thomas Pouplin, Kasia Kobalczyk, Hao Sun, Mihaela Van Der Schaar

ICML 2025 pp. 49628-49658

/icml/2025/pouplin2025icml-synergy/

Abstract

Developing autonomous agents capable of performing complex, multi-step decision-making tasks specified in natural language remains a significant challenge, particularly in realistic settings where labeled data is scarce and real-time experimentation is impractical. Existing reinforcement learning (RL) approaches often struggle to generalize to unseen goals and states, limiting their applicability. In this paper, we introduce $\textit{TEDUO}$, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. Unlike conventional methods, $\textit{TEDUO}$ operates on readily available, unlabeled datasets and addresses the challenge of generalization to previously unseen goals and states. Our approach harnesses large language models (LLMs) in a dual capacity: first, as automatization tools augmenting offline datasets with richer annotations, and second, as generalizable instruction-following agents. Empirical results demonstrate that $\textit{TEDUO}$ achieves data-efficient learning of robust language-conditioned policies, accomplishing tasks beyond the reach of conventional RL frameworks or out-of-the-box LLMs alone.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Pouplin et al. "The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Pouplin et al. "The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/pouplin2025icml-synergy/)

BibTeX

@inproceedings{pouplin2025icml-synergy,
  title     = {{The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data}},
  author    = {Pouplin, Thomas and Kobalczyk, Kasia and Sun, Hao and Van Der Schaar, Mihaela},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {49628-49658},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/pouplin2025icml-synergy/}
}