TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization
Abstract
Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task’s spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.
Cite
Text
Ding et al. "TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization." Proceedings of The 9th Conference on Robot Learning, 2025.Markdown
[Ding et al. "TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/ding2025corl-tref6/)BibTeX
@inproceedings{ding2025corl-tref6,
title = {{TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization}},
author = {Ding, Yuxuan and Wang, Shuangge and Fitzgerald, Tesca},
booktitle = {Proceedings of The 9th Conference on Robot Learning},
year = {2025},
pages = {5129-5150},
volume = {305},
url = {https://mlanthology.org/corl/2025/ding2025corl-tref6/}
}