Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models

Wang, Ziyi; Yu, Xumin; Rao, Yongming; Zhou, Jie; Lu, Jiwen

doi:10.1109/ICCV51070.2023.00519

Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models

Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu

ICCV 2023 pp. 5640-5650

doi:10.1109/ICCV51070.2023.00519 /iccv/2023/wang2023iccv-takeaphoto/

Abstract

With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of gen- erative pre-training. In this paper, we propose a novel 3D-to- 2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the su- periority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also ef- fective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNet- Part segmentation tasks. Code is available at https: //github.com/wangzy22/TakeAPhoto.

PDF ICCV Semantic Scholar

Cite

Text

Wang et al. "Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00519

Markdown

[Wang et al. "Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/wang2023iccv-takeaphoto/) doi:10.1109/ICCV51070.2023.00519

BibTeX

@inproceedings{wang2023iccv-takeaphoto,
  title     = {{Take-a-Photo: 3D-to-2D Generative Pre-Training of Point Cloud Models}},
  author    = {Wang, Ziyi and Yu, Xumin and Rao, Yongming and Zhou, Jie and Lu, Jiwen},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {5640-5650},
  doi       = {10.1109/ICCV51070.2023.00519},
  url       = {https://mlanthology.org/iccv/2023/wang2023iccv-takeaphoto/}
}