Generalist World Model Pre-Training for Efficient Reinforcement Learning
Abstract
Sample-efficient robot learning is a longstanding goal in robotics. Inspired by the success of scaling in vision and language, the robotics community is now investigating large-scale offline datasets for robot learning. However, existing methods often require expert and/or reward-labeled task-specific data, which can be costly and limit their application in practice. In this paper, we consider a more realistic setting where the offline data consists of reward-free and non-expert multi-embodiment offline data. We show that generalist world model pre-training (WPT), together with retrieval-based experience rehearsal and execution guidance, enables efficient reinforcement learning (RL) and fast task adaptation with such non-curated data. In experiments over 72 visuomotor tasks, spanning 6 different embodiments, covering hard exploration, complex dynamics, and various visual properties, WPT achieves 35.65% and 35% higher aggregated score compared to widely used learning-from-scratch baselines, respectively.
Cite
Text
Zhao et al. "Generalist World Model Pre-Training for Efficient Reinforcement Learning." ICLR 2025 Workshops: World_Models, 2025.Markdown
[Zhao et al. "Generalist World Model Pre-Training for Efficient Reinforcement Learning." ICLR 2025 Workshops: World_Models, 2025.](https://mlanthology.org/iclrw/2025/zhao2025iclrw-generalist/)BibTeX
@inproceedings{zhao2025iclrw-generalist,
title = {{Generalist World Model Pre-Training for Efficient Reinforcement Learning}},
author = {Zhao, Yi and Scannell, Aidan and Hou, Yuxin and Cui, Tianyu and Chen, Le and Büchler, Dieter and Solin, Arno and Kannala, Juho and Pajarinen, Joni},
booktitle = {ICLR 2025 Workshops: World_Models},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/zhao2025iclrw-generalist/}
}