RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Abstract

Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: Planning Capability, which involves decomposing complex manipulation instructions into manageable sub-tasks; Affordance Perception, the ability to recognize and interpret the affordances of interactive objects; and Trajectory Prediction, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities.Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various obotic tasks, highlighting its potential to advance robotic brain capabilities.

Cite

Text

Ji et al. "RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00168

Markdown

[Ji et al. "RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/ji2025cvpr-robobrain/) doi:10.1109/CVPR52734.2025.00168

BibTeX

@inproceedings{ji2025cvpr-robobrain,
  title     = {{RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete}},
  author    = {Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and Xue, Xinda and Su, Qinghang and Lyu, Huaihai and Zheng, Xiaolong and Liu, Jiaming and Wang, Zhongyuan and Zhang, Shanghang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {1724-1734},
  doi       = {10.1109/CVPR52734.2025.00168},
  url       = {https://mlanthology.org/cvpr/2025/ji2025cvpr-robobrain/}
}