RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Zhong, Yufeng; Feng, Chengjian; Yan, Feng; Liu, Fanfan; Zheng, Liming; Ma, Lin

RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

Yufeng Zhong, Chengjian Feng, Feng Yan, Fanfan Liu, Liming Zheng, Lin Ma

ICCV 2025 pp. 6416-6425

/iccv/2025/zhong2025iccv-robotronnav/

Abstract

In language-guided visual navigation, agents locate target objects in unseen environments using natural language instructions. For reliable navigation in unfamiliar scenes, agents should possess strong perception, planning, and prediction capabilities. Additionally, when agents revisit previously explored areas during long-term navigation, they may retain irrelevant and redundant historical perceptions, leading to suboptimal results. In this work, we propose RoboTron-Nav, a unified framework that integrates p erception, p lanning, and p rediction capabilities through multitask collaborations on navigation and embodied question answering tasks, thereby enhancing navigation performances. Furthermore, RoboTron-Nav employs an adaptive 3D-aware history sampling strategy to effectively and efficiently utilize historical observations. By leveraging large language model, RoboTron-Nav comprehends diverse commands and complex visual scenes, resulting in appropriate navigation actions. RoboTron-Nav achieves an 81.1% success rate in object goal navigation on the \mathrm CHORES -\mathbb S benchmark, setting a new state-of-the-art performance.

PDF ICCV Semantic Scholar

Cite

Text

Zhong et al. "RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction." International Conference on Computer Vision, 2025.

Markdown

[Zhong et al. "RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhong2025iccv-robotronnav/)

BibTeX

@inproceedings{zhong2025iccv-robotronnav,
  title     = {{RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction}},
  author    = {Zhong, Yufeng and Feng, Chengjian and Yan, Feng and Liu, Fanfan and Zheng, Liming and Ma, Lin},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {6416-6425},
  url       = {https://mlanthology.org/iccv/2025/zhong2025iccv-robotronnav/}
}