BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

Li, Yitang; Luo, Zhengyi; Zhang, Tonghe; Dai, Cunxi; Kanervisto, Anssi; Tirinzoni, Andrea; Weng, Haoyang; Kitani, Kris; Guzek, Mateusz; Touati, Ahmed; Lazaric, Alessandro; Pirotta, Matteo; Shi, Guanya

BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

Yitang Li, Zhengyi Luo, Tonghe Zhang, Cunxi Dai, Anssi Kanervisto, Andrea Tirinzoni, Haoyang Weng, Kris Kitani, Mateusz Guzek, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta, Guanya Shi

ICLR 2026

/iclr/2026/li2026iclr-bfmzero/

Abstract

Building Behavioral Foundation Models (BFMs) for humanoid robots has the potential to unify diverse control tasks under a single, promptable generalist policy. However, existing approaches are either exclusively deployed on simulated humanoid characters, or specialized to specific tasks such as tracking. We propose BFM-Zero, a framework that learns an effective shared latent representation that embeds motions, goals, and rewards into a common space, enabling a single policy to be prompted for multiple downstream tasks without retraining. This well-structured latent space in BFM-Zero enables versatile and robust whole-body skills on a Unitree G1 humanoid in the real world, via diverse inference methods, including zero-shot motion tracking, goal reaching, and reward inference, and few-shot optimization-based adaptation. Unlike prior on-policy reinforcement learning (RL) frameworks, BFM-Zero builds upon recent advancements in unsupervised RL and Forward-Backward (FB) models, which offer an objective-centric, explainable, and smooth latent representation of whole-body motions. We further extend BFM-Zero with critical reward shaping, domain randomization, and history-dependent asymmetric learning to bridge the sim-to-real gap. Those key design choices are quantitatively ablated in simulation. A first-of-its-kind model, BFM-Zero establishes a step toward scalable, promptable behavioral foundation models for whole-body humanoid control. Webpage: https://lecar-lab.github.io/BFM-Zero/

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-bfmzero/)

BibTeX

@inproceedings{li2026iclr-bfmzero,
  title     = {{BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning}},
  author    = {Li, Yitang and Luo, Zhengyi and Zhang, Tonghe and Dai, Cunxi and Kanervisto, Anssi and Tirinzoni, Andrea and Weng, Haoyang and Kitani, Kris and Guzek, Mateusz and Touati, Ahmed and Lazaric, Alessandro and Pirotta, Matteo and Shi, Guanya},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-bfmzero/}
}