What's the Move? Hybrid Imitation Learning via Salient Points
Abstract
While imitation learning (IL) offers a promising framework for teaching robots various behaviors, learning complex tasks remains challenging. Existing IL policies struggle to generalize effectively across visual and spatial variations even for simple tasks. In this work, we introduce **SPHINX**: **S**alient **P**oint-based **H**ybrid **I**mitatio**N** and e**X**ecution, a flexible IL policy that leverages multimodal observations (point clouds and wrist images), along with a hybrid action space of low-frequency, sparse waypoints and high-frequency, dense end effector movements. Given 3D point cloud observations, SPHINX learns to infer task-relevant points within a point cloud, or *salient points*, which support spatial generalization by focusing on semantically meaningful features. These salient points serve as anchor points to predict waypoints for long-range movement, such as reaching target poses in free-space. Once near a salient point, SPHINX learns to switch to predicting dense end-effector movements given close-up wrist images for precise phases of a task. By exploiting the strengths of different input modalities and action representations for different manipulation phases, SPHINX tackles complex tasks in a sample-efficient, generalizable manner. Our method achieves **86.7%** success across 4 real-world and 2 simulated tasks, outperforming the next best state-of-the-art IL baseline by **41.1%** on average across **440** real world trials. SPHINX additionally generalizes to novel viewpoints, visual distractors, spatial arrangements, and execution speeds with a **1.7x** speedup over the most competitive baseline. Our website (http://sphinx-manip.github.io) provides open-sourced code for data collection, training, and evaluation, along with supplementary videos.
Cite
Text
Sundaresan et al. "What's the Move? Hybrid Imitation Learning via Salient Points." International Conference on Learning Representations, 2025.Markdown
[Sundaresan et al. "What's the Move? Hybrid Imitation Learning via Salient Points." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/sundaresan2025iclr-move/)BibTeX
@inproceedings{sundaresan2025iclr-move,
title = {{What's the Move? Hybrid Imitation Learning via Salient Points}},
author = {Sundaresan, Priya and Hu, Hengyuan and Vuong, Quan and Bohg, Jeannette and Sadigh, Dorsa},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/sundaresan2025iclr-move/}
}