EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Abstract

We introduce EnerVerse, a generative robotics foundation model that constructs and interprets embodied spaces. EnerVerse employs a chunk-wise autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning. To model the 3D robotics world, we adopt a multi-view video representation, providing rich perspectives to address challenges like motion ambiguity and 3D grounding. Additionally, EnerVerse-D, a data engine pipeline combining generative modeling with 4D Gaussian Splatting, forms a self-reinforcing data loop to reduce the sim-to-real gap. Leveraging these innovations, EnerVerse translates 4D world representations into physical actions via a policy head (EnerVerse-A), achieving state-of-the-art performance in both simulation and real-world tasks. For efficiency, EnerVerse-A reuses features from the first denoising step and predicts action chunks, achieving about 280 ms per 8-step action chunk on a single RTX 4090. Further video demos, dataset samples could be found in our project page.

Cite

Text

Huang et al. "EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Huang et al. "EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/huang2025neurips-enerverse/)

BibTeX

@inproceedings{huang2025neurips-enerverse,
  title     = {{EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation}},
  author    = {Huang, Siyuan and Chen, Liliang and Zhou, Pengfei and Chen, Shengcong and Liao, Yue and Jiang, Zhengkai and Hu, Yue and Gao, Peng and Li, Hongsheng and Yao, Maoqing and Ren, Guanghui},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/huang2025neurips-enerverse/}
}