Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

Zhao, Yi; Scannell, Aidan; Zhao, Wenshuai; Hou, Yuxin; Cui, Tianyu; Chen, Le; Büchler, Dieter; Solin, Arno; Kannala, Juho; Pajarinen, Joni

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter Büchler, Arno Solin, Juho Kannala, Joni Pajarinen

ICLR 2026

/iclr/2026/zhao2026iclr-efficient/

Abstract

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: i) experience rehearsal and ii) execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhao et al. "Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data." International Conference on Learning Representations, 2026.

Markdown

[Zhao et al. "Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-efficient/)

BibTeX

@inproceedings{zhao2026iclr-efficient,
  title     = {{Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data}},
  author    = {Zhao, Yi and Scannell, Aidan and Zhao, Wenshuai and Hou, Yuxin and Cui, Tianyu and Chen, Le and Büchler, Dieter and Solin, Arno and Kannala, Juho and Pajarinen, Joni},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-efficient/}
}