FLARE: Robot Learning with Implicit World Modeling

Abstract

We introduce **F**uture **LA**tent **R**presentation Alignm**E**nt (**FLARE**), a novel framework that integrates predictive world modeling into robot policy learning. By aligning features from a diffusion transformer with latent embeddings of future observations, **FLARE** enables a diffusion transformer policy to anticipate latent representations of future observations, allowing it to reason about long-term consequences while generating actions. Remarkably lightweight, **FLARE** requires only minimal architectural modifications—adding a few tokens to standard vision-language-action (VLA) models—yet delivers substantial performance gains. Across two challenging multitask simulation imitation learning benchmarks spanning single-arm and humanoid tabletop manipulation, **FLARE** achieves state-of-the-art performance, outperforming prior policy learning baselines by up to 26%. Moreover, **FLARE** unlocks the ability to co-train with human egocentric video demonstrations lacking action labels, significantly boosting policy generalization to a novel object with unseen geometry with as few as 1 robot demonstration. Our results establish **FLARE** as a general and scalable approach for combining implicit world modeling with high-frequency robotic control.

Cite

Text

Zheng et al. "FLARE: Robot Learning with Implicit World Modeling." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Zheng et al. "FLARE: Robot Learning with Implicit World Modeling." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/zheng2025corl-flare/)

BibTeX

@inproceedings{zheng2025corl-flare,
  title     = {{FLARE: Robot Learning with Implicit World Modeling}},
  author    = {Zheng, Ruijie and Wang, Jing and Reed, Scott and Bjorck, Johan and Fang, Yu and Hu, Fengyuan and Jang, Joel and Kundalia, Kaushil and Lin, Zongyu and Magne, Loïc and Narayan, Avnish and Tan, You Liang and Wang, Guanzhi and Wang, Qi and Xiang, Jiannan and Xu, Yinzhen and Ye, Seonghyeon and Kautz, Jan and Huang, Furong and Zhu, Yuke and Fan, Linxi},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {3952-3971},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/zheng2025corl-flare/}
}