SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Abstract

While spatial reasoning has made progress in object localization relationships, it often overlooks object orientation—a key factor in 6-DoF fine-grained manipulation. Traditional pose representations rely on pre-defined frames or templates, limiting generalization and semantic grounding. In this paper, we introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner (e.g., the ''plug-in'' direction of a USB or the ''handle'' direction of a cup). To support this, we construct OrienText300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop PointSO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SoFar framework enables 6-DoF spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SoFar, e.g., zero-shot 48.7\% successful rate on Open6DOR and zero-shot 74.9\% successful rate on SIMPLER-Env.

Cite

Text

Qi et al. "SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Qi et al. "SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/qi2025neurips-sofar/)

BibTeX

@inproceedings{qi2025neurips-sofar,
  title     = {{SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation}},
  author    = {Qi, Zekun and Zhang, Wenyao and Ding, Yufei and Dong, Runpei and Yu, XinQiang and Li, Jingwen and Xu, Lingyun and Li, Baoyu and He, Xialin and Fan, Guofan and Zhang, Jiazhao and He, Jiawei and Gu, Jiayuan and Jin, Xin and Ma, Kaisheng and Zhang, Zhizheng and Wang, He and Yi, Li},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/qi2025neurips-sofar/}
}