Learning to Navigate in Open Urban Environments Using a Simple Sim2Real Strategy
Abstract
Autonomous navigation in open, dynamic urban environments poses unique challenges due to unstructured instructions, complex layouts, and moving obstacles. We propose Real-Nav,a unified vision-and-language navigation framework that operates seamlessly indoors and outdoors by tightly integrating semantic mapping with multimodal alignment. A simple simulation-to-reality adaptation strategy based on social-aware decision modules is employed for real-world deployment. Furthermore, in order to utilize the 3D semantic information of the space to be explored efficiently, we propose an additional pre-exploration stage in our model. We constructed a virtual environment simulator based on real photograph data, Tsinghua-roads, from Tsinghua University and completed the training on this simulator, then we evaluate Real-Nav on challenging vision-and-language navigation benchmarks and in a real-world campus setting. Our work demonstrate that building and exploiting semantic maps and employing curiosity-driven target candidate screening can significantly boost embodied navigation performance in both simulated and real-world environments.
Cite
Text
He et al. "Learning to Navigate in Open Urban Environments Using a Simple Sim2Real Strategy." ICLR 2025 Workshops: EmbodiedAI, 2025.Markdown
[He et al. "Learning to Navigate in Open Urban Environments Using a Simple Sim2Real Strategy." ICLR 2025 Workshops: EmbodiedAI, 2025.](https://mlanthology.org/iclrw/2025/he2025iclrw-learning/)BibTeX
@inproceedings{he2025iclrw-learning,
title = {{Learning to Navigate in Open Urban Environments Using a Simple Sim2Real Strategy}},
author = {He, Lixuan and Dong, Haoyu and Yu, Yangcheng and Chen, Zhenxing and Feng, Jie and Wang, Xin and Li, Yong},
booktitle = {ICLR 2025 Workshops: EmbodiedAI},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/he2025iclrw-learning/}
}