RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model
Abstract
Minigolf is an exemplary real-world game for examining embodied intelligence, requiring challenging spatial and kinodynamic understanding to putt the ball. Additionally, reflective reasoning is required if the feasibility of a challenge is not ensured. We introduce RoboGolf, a VLM-based framework that combines dual-camera perception with closed-loop action refinement, augmented by a reflective equilibrium loop. The core of both loops is powered by finetuned VLMs. We analyze the capabilities of the framework in an offline inference setting, relying on an extensive set of recorded trajectories. Exemplary demonstrations of the analyzed problem domain are available at https://robogolfvlm.github.io/.
Cite
Text
Zhou et al. "RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model." ICML 2024 Workshops: MFM-EAI, 2024.Markdown
[Zhou et al. "RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model." ICML 2024 Workshops: MFM-EAI, 2024.](https://mlanthology.org/icmlw/2024/zhou2024icmlw-robogolf/)BibTeX
@inproceedings{zhou2024icmlw-robogolf,
title = {{RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model}},
author = {Zhou, Hantao and Ji, Tianying and Sommerhalder, Lukas and Görner, Michael and Hendrich, Norman and Sun, Fuchun and Zhang, Jianwei Dr. and Xu, Huazhe},
booktitle = {ICML 2024 Workshops: MFM-EAI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/zhou2024icmlw-robogolf/}
}