OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-Aware Reasoning

Liu, Yuecheng; Chi, DaFeng; Wu, Shiguang; Zhang, Zhanguang; Zhuang, Yuzheng; Yang, Bowen; Zhu, He; Zhang, Lingfeng; Xie, Pengwei; Bravo, David Gamaliel Arcos; Zhang, Yingxue; Hao, Jianye; Quan, Xingyue

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-Aware Reasoning

Yuecheng Liu, DaFeng Chi, Shiguang Wu, Zhanguang Zhang, Yuzheng Zhuang, Bowen Yang, He Zhu, Lingfeng Zhang, Pengwei Xie, David Gamaliel Arcos Bravo, Yingxue Zhang, Jianye Hao, Xingyue Quan

ICLR 2026

/iclr/2026/liu2026iclr-omnieva/

Abstract

Recent advances in multimodal large language models (MLLMs) have opened new opportunities for embodied intelligence, enabling multimodal understanding, reasoning, and interaction, as well as continuous spatial decision-making. Nevertheless, current MLLM-based embodied systems face two critical limitations. First, **Geometric Adaptability Gap:** models trained solely on 2D inputs or with hard-coded 3D geometry injection suffer from either insufficient spatial information or restricted 2D generalization, leading to poor adaptability across tasks with diverse spatial demands. Second, **Embodiment Constraint Gap**: prior work often neglects the physical constraints of real robots, resulting in task plans that are theoretically valid but practically infeasible. To address these gaps, we introduce **OmniEVA** -- an embodied versatile planner that enables advanced embodied reasoning and task planning through two pivotal innovations: (1) a **Task-Adaptive 3D Grounding** mechanism, which uses a gated router to dynamically inject 3D features based on task context, enabling selective geometric reasoning. (2) an **Embodiment-Aware Reasoning** framework that incorporates task goals and physical constraints into the reasoning loop, ensuring executable plans. Extensive experiments show that OmniEVA achieves state-of-the-art performance on 7 of 8 embodied reasoning benchmarks and excels in downstream tasks such as object navigation and mobile manipulation. Evaluations on proposed primitive and composite benchmarks confirm its robust and versatile planning capabilities.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Liu et al. "OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-Aware Reasoning." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-Aware Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-omnieva/)

BibTeX

@inproceedings{liu2026iclr-omnieva,
  title     = {{OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-Aware Reasoning}},
  author    = {Liu, Yuecheng and Chi, DaFeng and Wu, Shiguang and Zhang, Zhanguang and Zhuang, Yuzheng and Yang, Bowen and Zhu, He and Zhang, Lingfeng and Xie, Pengwei and Bravo, David Gamaliel Arcos and Zhang, Yingxue and Hao, Jianye and Quan, Xingyue},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-omnieva/}
}