OpenFly: A COMPREHENSIVE PLATFORM for AERIAL VISION-LANGUAGE NAVIGATION

Abstract

Aerial Vision-Language Navigation (VLN) seeks to guide UAVs by leveraging language instructions and visual cues, establishing a new paradigm for human-UAV interaction. However, the collection of VLN data demands extensive human effort to construct trajectories and corresponding instructions, hindering the development of large-scale datasets and capable models. To address this problem, we propose OpenFly, a comprehensive platform for aerial VLN. Firstly, OpenFly integrates 4 rendering engines and advanced techniques for diverse environment simulation, including Unreal Engine, GTA V, Google Earth, and 3D Gaussian Splatting (3D GS). Particularly, 3D GS supports real-to-sim rendering, further enhancing the realism of our environments. Secondly, we develop a highly automated toolchain for aerial VLN data collection, streamlining point cloud acquisition, scene semantic segmentation, flight trajectory creation, and instruction generation. Thirdly, based on the toolchain, we construct a large-scale aerial VLN dataset with 100k trajectories, covering samples of diverse scenarios and assets across 18 scenes. Moreover, we propose OpenFly-Agent, a keyframe-aware VLN model emphasizing key observations to promote performance and reduce computations. For benchmarking, extensive experiments and analyses are conducted, where our navigation success rate outperforms others by 14.0\% and 7.9\% on the seen and unseen scenarios, respectively. The toolchain, dataset, and codes will be open-sourced.

Cite

Text

Gao et al. "OpenFly: A COMPREHENSIVE PLATFORM for AERIAL VISION-LANGUAGE NAVIGATION." International Conference on Learning Representations, 2026.

Markdown

[Gao et al. "OpenFly: A COMPREHENSIVE PLATFORM for AERIAL VISION-LANGUAGE NAVIGATION." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/gao2026iclr-openfly/)

BibTeX

@inproceedings{gao2026iclr-openfly,
  title     = {{OpenFly: A COMPREHENSIVE PLATFORM for AERIAL VISION-LANGUAGE NAVIGATION}},
  author    = {Gao, Yunpeng and Li, Chenhui and You, Zhongrui and Liu, Junli and Zhen, Li and Chen, Pengan and Chen, Qizhi and Tang, Zhonghan and Wang, Liansheng and Yangpenghui,  and Tang, Yiwen and Tang, Yuhang and Liang, Shuai and Zhu, Songyi and Xiong, Ziqin and Su, Yifei and Ye, Xinyi and Li, Jianan and Ding, Yan and Wang, Dong and Wang, Zhigang and Zhao, Bin and Li, Xuelong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/gao2026iclr-openfly/}
}