ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Abstract
Vision-Language Models (VLMs) have revolutionized artificial intelligence and robotics due to their commonsense reasoning capabilities. In robotic manipulation, VLMs are used primarily as high-level planners, but recent work has also studied their lower-level reasoning ability, which refers to making decisions about precise robot movements. However, the community currently lacks a clear and common benchmark that can evaluate how well VLMs can aid low-level reasoning in robotics. Consequently, we propose a novel benchmark, ManipBench, to evaluate the low-level robot manipulation reasoning capabilities of VLMs across various dimensions, including how well they understand object-object interactions and deformable object manipulation. We extensively test 35 common and state-of-the-art VLM families on our benchmark, including variants to test different model sizes. The performance of VLMs significantly varies across tasks, and there is a strong correlation between this performance and trends in our real-world manipulation tasks. It also shows that there remains a significant gap between these models and human-level understanding.
Cite
Text
Zhao et al. "ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation." Proceedings of The 9th Conference on Robot Learning, 2025.Markdown
[Zhao et al. "ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/zhao2025corl-manipbench/)BibTeX
@inproceedings{zhao2025corl-manipbench,
title = {{ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation}},
author = {Zhao, Enyu and Raval, Vedant and Zhang, Hejia and Mao, Jiageng and Shangguan, Zeyu and Nikolaidis, Stefanos and Wang, Yue and Seita, Daniel},
booktitle = {Proceedings of The 9th Conference on Robot Learning},
year = {2025},
pages = {3413-3462},
volume = {305},
url = {https://mlanthology.org/corl/2025/zhao2025corl-manipbench/}
}