Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Abstract

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans. Code and videos: https://www.nicklashansen.com/rlpuppeteer

Cite

Text

Hansen et al. "Hierarchical World Models as Visual Whole-Body Humanoid Controllers." International Conference on Learning Representations, 2025.

Markdown

[Hansen et al. "Hierarchical World Models as Visual Whole-Body Humanoid Controllers." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/hansen2025iclr-hierarchical/)

BibTeX

@inproceedings{hansen2025iclr-hierarchical,
  title     = {{Hierarchical World Models as Visual Whole-Body Humanoid Controllers}},
  author    = {Hansen, Nicklas and Jyothir, S V and Sobal, Vlad and LeCun, Yann and Wang, Xiaolong and Su, Hao},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/hansen2025iclr-hierarchical/}
}