PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

Abstract

Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

Cite

Text

Liu et al. "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.02082

Markdown

[Liu et al. "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/liu2023cvpr-partslip/) doi:10.1109/CVPR52729.2023.02082

BibTeX

@inproceedings{liu2023cvpr-partslip,
  title     = {{PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models}},
  author    = {Liu, Minghua and Zhu, Yinhao and Cai, Hong and Han, Shizhong and Ling, Zhan and Porikli, Fatih and Su, Hao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {21736-21746},
  doi       = {10.1109/CVPR52729.2023.02082},
  url       = {https://mlanthology.org/cvpr/2023/liu2023cvpr-partslip/}
}