Towards Physically Executable 3D Gaussian for Embodied Navigation
Abstract
3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose **SAGE-3D** (**S**emantically and Physically **A**ligned **G**aussian **E**nvironments for **3D** Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: **(1) Object-Centric Semantic Grounding**, which adds object-level fine-grained annotations to 3DGS; and **(2) Physics-Aware Execution Jointing**, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release **InteriorGS**, containing 1K object-annotated 3DGS indoor scene data, and introduce **SAGE-Bench**, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task.
Cite
Text
Miao et al. "Towards Physically Executable 3D Gaussian for Embodied Navigation." International Conference on Learning Representations, 2026.Markdown
[Miao et al. "Towards Physically Executable 3D Gaussian for Embodied Navigation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/miao2026iclr-physically/)BibTeX
@inproceedings{miao2026iclr-physically,
title = {{Towards Physically Executable 3D Gaussian for Embodied Navigation}},
author = {Miao, Bingchen and Wei, Rong and Ge, Zhiqi and Sun, Xiaoquan and Gao, Shiqi and Zhu, Jingzhe and Wang, Renhan and Tang, Siliang and Xiao, Jun and Tang, Rui and Li, Juncheng},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/miao2026iclr-physically/}
}