Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation

Abstract

Robotic manipulation in unstructured environments requires systems that can generalize across diverse tasks while maintaining robust and reliable performance. We introduce GVF-TAPE, a closed-loop framework that combines generative visual foresight with task-agnostic pose estimation to enable scalable robotic manipulation. GVF-TAPE employs a generative video model to predict future RGB-D frames from a single RGB side-view image and a task description, offering visual plans that guide robot actions. A decoupled pose estimation model then extracts end-effector poses from the predicted frames, translating them into executable commands via low-level controllers. By iteratively integrating video foresight and pose estimation in a closed loop, GVF-TAPE achieves real-time, adaptive manipulation across a broad range of tasks. Extensive experiments in both simulation and real-world settings demonstrate that our approach reduces reliance on task-specific action data and generalizes effectively, providing a practical and scalable solution for intelligent robotic systems

Cite

Text

Zhang et al. "Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Zhang et al. "Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/zhang2025corl-generative/)

BibTeX

@inproceedings{zhang2025corl-generative,
  title     = {{Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation}},
  author    = {Zhang, Chuye and Zhang, Xiaoxiong and Zheng, Linfang and Pan, Wei and Zhang, Wei},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {2823-2846},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/zhang2025corl-generative/}
}