RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective
Abstract
Precise robot manipulations require rich spatial information in imitation learning, which remains a challenge in both 2D and 3D based policies. To tackle this problem, we present RISE, an end-to-end baseline for real-world imitation learning, which predicts continuous actions directly from single-view point clouds. It compresses the point cloud to tokens with a sparse 3D encoder. After adding sparse positional encoding, the tokens are featurized using a transformer. Finally, the features are decoded into robot actions by a diffusion head. Trained with 50 demonstrations for each real-world task, RISE surpasses currently representative 2D and 3D policies by a large margin, showcasing significant advantages in both accuracy and efficiency. Project website: rise-policy.github.io.
Cite
Text
Wang et al. "RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective." ICML 2024 Workshops: MFM-EAI, 2024.Markdown
[Wang et al. "RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective." ICML 2024 Workshops: MFM-EAI, 2024.](https://mlanthology.org/icmlw/2024/wang2024icmlw-rise/)BibTeX
@inproceedings{wang2024icmlw-rise,
title = {{RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective}},
author = {Wang, Chenxi and Fang, Hongjie and Fang, Hao-Shu and Lu, Cewu},
booktitle = {ICML 2024 Workshops: MFM-EAI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/wang2024icmlw-rise/}
}