Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs

Abstract

Cross-modal training using 2D-3D paired datasets, such as those containing multi-view images and 3D scene scans, presents an effective way to enhance 2D scene understanding by introducing geometric and view-invariance priors into 2D features. However, the need for large-scale scene datasets can impede scalability and further improvements. This paper explores an alternative learning method by leveraging a lightweight and publicly available type of 3D data in the form of CAD models. We construct a 3D space with geometric-aware alignment where the similarity in this space reflects the geometric similarity of CAD models based on the Chamfer distance. The acquired geometric-aware properties are then induced into 2D features, which boost performance on downstream tasks more effectively than existing RGB-CAD approaches. Our technique is not limited to paired RGB-CAD datasets. By training exclusively on pseudo pairs generated from CAD-based reconstruction methods, we enhance the performance of SOTA 2D pre-trained models that use ResNet-50 or ViT-B backbones on various 2D understanding tasks. We also achieve comparable results to SOTA methods trained on scene scans on four tasks in NYUv2, SUNRGB-D, indoor ADE20k, and indoor/outdoor COCO, despite using lightweight CAD models or pseudo data.

Cite

Text

Arsomngern et al. "Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.02047

Markdown

[Arsomngern et al. "Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/arsomngern2023cvpr-learning/) doi:10.1109/CVPR52729.2023.02047

BibTeX

@inproceedings{arsomngern2023cvpr-learning,
  title     = {{Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs}},
  author    = {Arsomngern, Pattaramanee and Nutanong, Sarana and Suwajanakorn, Supasorn},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {21371-21381},
  doi       = {10.1109/CVPR52729.2023.02047},
  url       = {https://mlanthology.org/cvpr/2023/arsomngern2023cvpr-learning/}
}