PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation

Abstract

The fragmentation between high-level task semantics and low-level geometric features remains a persistent challenge in robotic manipulation. While vision-language models (VLMs) have shown promise in generating affordance-aware visual representations, the lack of semantic grounding in canonical spaces and reliance on manual annotations severely limit their ability to capture dynamic semantic-affordance relationships. To address these, we propose Primitive-Aware Semantic Grounding (PASG), a closed-loop framework that introduces: (1) Automatic primitive extraction through geometric feature aggregation, enabling cross-category detection of keypoints and axes; (2) VLM-driven semantic anchoring that dynamically couples geometric primitives with functional affordances and task-relevant description; (3) A spatial-semantic reasoning benchmark and a fine-tuned VLM (Qwen2.5VL-PA). We demonstrate PASG's effectiveness in practical robotic manipulation tasks across diverse scenarios, achieving performance comparable to manual annotations. PASG achieves a finer-grained semantic-affordance understanding of objects, establishing a unified paradigm for bridging geometric primitives with task semantics in robotic manipulation.

Cite

Text

Zhu et al. "PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation." International Conference on Computer Vision, 2025.

Markdown

[Zhu et al. "PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhu2025iccv-pasg/)

BibTeX

@inproceedings{zhu2025iccv-pasg,
  title     = {{PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation}},
  author    = {Zhu, Zhihao and Zheng, Yifan and Pan, Siyu and Jin, Yaohui and Mu, Yao},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {8950-8960},
  url       = {https://mlanthology.org/iccv/2025/zhu2025iccv-pasg/}
}