The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data
Abstract
Developing autonomous agents capable of performing complex, multi-step decision-making tasks specified in natural language remains a significant challenge, particularly in realistic settings where labeled data is scarce and real-time experimentation is impractical. Existing reinforcement learning (RL) approaches often struggle to generalize to unseen goals and states, limiting their applicability. In this paper, we introduce $\textit{TEDUO}$, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. Unlike conventional methods, $\textit{TEDUO}$ operates on readily available, unlabeled datasets and addresses the challenge of generalization to previously unseen goals and states. Our approach harnesses large language models (LLMs) in a dual capacity: first, as automatization tools augmenting offline datasets with richer annotations, and second, as generalizable instruction-following agents. Empirical results demonstrate that $\textit{TEDUO}$ achieves data-efficient learning of robust language-conditioned policies, accomplishing tasks beyond the reach of conventional RL frameworks or out-of-the-box LLMs alone.
Cite
Text
Pouplin et al. "The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Pouplin et al. "The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/pouplin2025icml-synergy/)BibTeX
@inproceedings{pouplin2025icml-synergy,
title = {{The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-Fidelity Data}},
author = {Pouplin, Thomas and Kobalczyk, Kasia and Sun, Hao and Van Der Schaar, Mihaela},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {49628-49658},
volume = {267},
url = {https://mlanthology.org/icml/2025/pouplin2025icml-synergy/}
}