Reward-Free World Models for Online Imitation Learning

Abstract

Imitation learning (IL) enables agents to acquire skills directly from expert demonstrations, providing a compelling alternative to reinforcement learning. However, prior online IL approaches struggle with complex tasks characterized by high-dimensional inputs and complex dynamics. In this work, we propose a novel approach to online imitation learning that leverages reward-free world models. Our method learns environmental dynamics entirely in latent spaces without reconstruction, enabling efficient and accurate modeling. We adopt the inverse soft-Q learning objective, reformulating the optimization process in the Q-policy space to mitigate the instability associated with traditional optimization in the reward-policy space. By employing a learned latent dynamics model and planning for control, our approach consistently achieves stable, expert-level performance in tasks with high-dimensional observation or action spaces and intricate dynamics. We evaluate our method on a diverse set of benchmarks, including DMControl, MyoSuite, and ManiSkill2, demonstrating superior empirical performance compared to existing approaches.

Cite

Text

Li et al. "Reward-Free World Models for Online Imitation Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Li et al. "Reward-Free World Models for Online Imitation Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/li2025icml-rewardfree/)

BibTeX

@inproceedings{li2025icml-rewardfree,
  title     = {{Reward-Free World Models for Online Imitation Learning}},
  author    = {Li, Shangzhe and Huang, Zhiao and Su, Hao},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {34702-34724},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/li2025icml-rewardfree/}
}