PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations

Abstract

Learning a generalizable object manipulation policy is vital for an embodied agent to work in complex real-world scenes. Parts, as the shared components in different object categories, have the potential to increase the generalization ability of the manipulation policy and achieve cross-category object manipulation. In this work, we build the first large-scale, part-based cross-category object manipulation benchmark, PartManip, which is composed of 11 object categories, 494 objects, and 1432 tasks in 6 task classes. Compared to previous work, our benchmark is also more diverse and realistic, i.e., having more objects and using sparse-view point cloud as input without oracle information like part segmentation. To tackle the difficulties of vision-based policy learning, we first train a state-based expert with our proposed part-based canonicalization and part-aware rewards, and then distill the knowledge to a vision-based student. We also find an expressive backbone is essential to overcome the large diversity of different objects. For cross-category generalization, we introduce domain adversarial learning for domain-invariant feature extraction. Extensive experiments in simulation show that our learned policy can outperform other methods by a large margin, especially on unseen object categories. We also demonstrate our method can successfully manipulate novel objects in the real world.

Cite

Text

Geng et al. "PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00291

Markdown

[Geng et al. "PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/geng2023cvpr-partmanip/) doi:10.1109/CVPR52729.2023.00291

BibTeX

@inproceedings{geng2023cvpr-partmanip,
  title     = {{PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations}},
  author    = {Geng, Haoran and Li, Ziming and Geng, Yiran and Chen, Jiayi and Dong, Hao and Wang, He},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {2978-2988},
  doi       = {10.1109/CVPR52729.2023.00291},
  url       = {https://mlanthology.org/cvpr/2023/geng2023cvpr-partmanip/}
}