Point2Real: Bridging the Gap Between Point Cloud and Realistic Image for Open-World 3D Recognition
Abstract
Recognition in open-world scenarios is an important and challenging field, where Vision-Language Pre-training paradigms have greatly impacted the 2D domain. This inspires a growing interest in introducing 2D pre-trained models, such as CLIP, into the 3D domain to enhance the ability of point cloud understanding. Considering the difference between discrete 3D point clouds and real-world 2D images, reducing the domain gap is crucial. Some recent works project point clouds onto a 2D plane to enable 3D zero-shot capabilities without training. However, this simplistic approach leads to an unclear or even distorted geometric structure, limiting the potential of 2D pre-trained models in 3D. To address the domain gap, we propose Point2Real, a training-free framework based on the realistic rendering technique to automate the transformation of the 3D point cloud domain into the Vision-Language domain. Specifically, Point2Real leverages a shape recovery module that devises an iterative ball-pivoting algorithm to convert point clouds into meshes, narrowing the gap in shape at first. To simulate photo-realistic images, a set of refined textures as candidates is applied for rendering, where the CLIP confidence is utilized to select the suitable one. Moreover, to tackle the viewpoint challenge, a heuristic multi-view adapter is implemented for feature aggregation, which exploits the depth surface as an effective indicator of view-specific discriminability for recognition. We conduct experiments on ModelNet10, ModelNet40, and ScanObjectNN datasets, and the results demonstrate that Point2Real outperforms other approaches in zero-shot and few-shot tasks by a large margin.
Cite
Text
Li et al. "Point2Real: Bridging the Gap Between Point Cloud and Realistic Image for Open-World 3D Recognition." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I4.28088Markdown
[Li et al. "Point2Real: Bridging the Gap Between Point Cloud and Realistic Image for Open-World 3D Recognition." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/li2024aaai-point/) doi:10.1609/AAAI.V38I4.28088BibTeX
@inproceedings{li2024aaai-point,
title = {{Point2Real: Bridging the Gap Between Point Cloud and Realistic Image for Open-World 3D Recognition}},
author = {Li, Hanxuan and Fu, Bin and Wang, Ruiping and Chen, Xilin},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {3055-3063},
doi = {10.1609/AAAI.V38I4.28088},
url = {https://mlanthology.org/aaai/2024/li2024aaai-point/}
}