TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization

Abstract

Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task’s spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.

Cite

Text

Ding et al. "TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Ding et al. "TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/ding2025corl-tref6/)

BibTeX

@inproceedings{ding2025corl-tref6,
  title     = {{TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization}},
  author    = {Ding, Yuxuan and Wang, Shuangge and Fitzgerald, Tesca},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {5129-5150},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/ding2025corl-tref6/}
}