Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model

Abstract

This paper introduces ZeroMatch, a novel zero-shot RGB-D point cloud registration framework, aimed at achieving robust 3D matching on unseen data without any task-specific training. Our core idea is to utilize the powerful zero-shot image representation of Stable Diffusion, achieved through extensive pre-training on large-scale data, to enhance point-cloud geometric descriptors for robust matching. Specifically, we combine the handcrafted geometric descriptor FPFH with Stable-Diffusion features to create point descriptors that are both locally and contextually aware, enabling reliable RGB-D registration with zero-shot capability. This approach is based on our observation that Stable-Diffusion features effectively encode discriminative global contextual cues, naturally alleviating the feature ambiguity that FPFH often encounters in scenes with repetitive patterns or low overlap. To further enhance cross-view consistency of Stable-Diffusion features for improved matching, we propose a coupled-image input mode that concatenates the source and target images into a single input, replacing the original single-image mode. This design achieves both inter-image and prompt-to-image consistency attentions, facilitating robust cross-view feature interaction and alignment. Finally, we leverage feature nearest neighbors to construct putative correspondences for hypothesize-and-verify transformation estimation. Extensive experiments on 3DMatch, ScanNet, and ScanLoNet verify the excellent zero-shot matching ability of our method.

Cite

Text

Jiang et al. "Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01579

Markdown

[Jiang et al. "Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/jiang2025cvpr-zeroshot/) doi:10.1109/CVPR52734.2025.01579

BibTeX

@inproceedings{jiang2025cvpr-zeroshot,
  title     = {{Zero-Shot RGB-D Point Cloud Registration with Pre-Trained Large Vision Model}},
  author    = {Jiang, Haobo and Xie, Jin and Yang, Jian and Yu, Liang and Zheng, Jianmin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {16943-16952},
  doi       = {10.1109/CVPR52734.2025.01579},
  url       = {https://mlanthology.org/cvpr/2025/jiang2025cvpr-zeroshot/}
}