RoboCerebra: A Large-Scale Benchmark for Long-Horizon Robotic Manipulation Evaluation

Abstract

Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs’ strengths in semantic reasoning and long-horizon planning. These System 2 capabilities—characterized by deliberative, goal-directed thinking—remain underexplored due to the limited temporal scale and structural complexity of current benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation. RoboCerebra includes: (1) a large-scale simulation dataset with extended task horizons and diverse subtask sequences in household environments; (2) a hierarchical framework combining a high-level VLM planner with a low-level vision-language-action (VLA) controller; and (3) an evaluation protocol targeting planning, reflection, and memory through structured System 1–System 2 interaction. The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences. Human operators execute the subtasks in simulation, yielding high-quality trajectories with dynamic object variations. Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations. We further benchmark state-of-the-art VLMs as System 2 modules and analyze their performance across key cognitive dimensions, advancing the development of more capable and generalizable robotic planners.

Cite

Text

Han et al. "RoboCerebra: A Large-Scale Benchmark for Long-Horizon Robotic Manipulation Evaluation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Han et al. "RoboCerebra: A Large-Scale Benchmark for Long-Horizon Robotic Manipulation Evaluation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/han2025neurips-robocerebra/)

BibTeX

@inproceedings{han2025neurips-robocerebra,
  title     = {{RoboCerebra: A Large-Scale Benchmark for Long-Horizon Robotic Manipulation Evaluation}},
  author    = {Han, Songhao and Qiu, Boxiang and Liao, Yue and Huang, Siyuan and Gao, Chen and Yan, Shuicheng and Liu, Si},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/han2025neurips-robocerebra/}
}