TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Zheng, Ruijie; Liang, Yongyuan; Huang, Shuaiyi; Gao, Jianfeng; Iii, Hal Daumé; Kolobov, Andrey; Huang, Furong; Yang, Jianwei

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé Iii, Andrey Kolobov, Furong Huang, Jianwei Yang

ICLR 2025

/iclr/2025/zheng2025iclr-tracevla/

Abstract

Although large vision-language-action (VLA) models pretrained on extensive robot datasets offer promising generalist policies for robotic learning, they still struggle with spatial-temporal dynamics in interactive robotics, making them less effective in handling complex tasks, such as manipulation. In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate VLA models’ spatial-temporal awareness for action prediction by encoding state-action trajectories visually. We develop a new TraceVLA model by finetuning OpenVLA on our own collected dataset of 150K robot manipulation trajectories using visual trace prompting. Evaluations of TraceVLA across 137 configurations in SimplerEnv and 4 tasks on a physical WidowX robot demonstrate state-of-the-art performance, outperforming OpenVLA by 10% on SimplerEnv and 3.5x on real-robot tasks and exhibiting robust generalization across diverse embodiments and scenarios. To further validate the effectiveness and generality of our method, we present a compact VLA model based on 4B Phi-3-Vision, pretrained on the Open-X-Embodiment and finetuned on our dataset, rivals the 7B OpenVLA baseline while significantly improving inference efficiency.

PDF ICLR Semantic Scholar

Cite

Text

Zheng et al. "TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies." International Conference on Learning Representations, 2025.

Markdown

[Zheng et al. "TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/zheng2025iclr-tracevla/)

BibTeX

@inproceedings{zheng2025iclr-tracevla,
  title     = {{TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies}},
  author    = {Zheng, Ruijie and Liang, Yongyuan and Huang, Shuaiyi and Gao, Jianfeng and Iii, Hal Daumé and Kolobov, Andrey and Huang, Furong and Yang, Jianwei},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/zheng2025iclr-tracevla/}
}