One-Shot 3D Object Canonicalization Based on Geometric and Semantic Consistency

Abstract

3D object canonicalization is a fundamental task, essential for various downstream tasks. Existing methods rely on either cumbersome manual processes or priors learned from extensive, per-category training samples. Real-world datasets, however, often exhibit long-tail distributions, challenging existing learning-based methods, especially in categories with limited samples. We address this by introducing the first one-shot category-level object canonicalization framework that operates under arbitrary poses, requiring only a single canonical model as a reference (the "prior model") for each category. To canonicalize any object, our framework first extracts semantic cues with large language models (LLMs) and vision-language models (VLMs) to establish correspondences with the prior model. We introduce a novel joint energy function to enforce geometric and semantic consistency, aligning object orientations precisely despite significant shape variations. Moreover, we adopt a support-plane strategy to reduce search space for initial poses and utilize a semantic relationship map to select the canonical pose from multiple hypotheses. Extensive experiments on multiple datasets demonstrate that our framework achieves state-of-the-art performance and validates key design choices. Using our framework, we create the Canonical Objaverse Dataset (COD), canonicalizing 32K samples in the Objaverse-LVIS dataset, underscoring the effectiveness of our framework on handling large-scale datasets. Project page at https://jinli998.github.io/One-shot_3D_Object_Canonicalization/

Cite

Text

Jin et al. "One-Shot 3D Object Canonicalization Based on Geometric and Semantic Consistency." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01570

Markdown

[Jin et al. "One-Shot 3D Object Canonicalization Based on Geometric and Semantic Consistency." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/jin2025cvpr-oneshot/) doi:10.1109/CVPR52734.2025.01570

BibTeX

@inproceedings{jin2025cvpr-oneshot,
  title     = {{One-Shot 3D Object Canonicalization Based on Geometric and Semantic Consistency}},
  author    = {Jin, Li and Wang, Yujie and Chen, Wenzheng and Dai, Qiyu and Gao, Qingzhe and Qin, Xueying and Chen, Baoquan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {16850-16859},
  doi       = {10.1109/CVPR52734.2025.01570},
  url       = {https://mlanthology.org/cvpr/2025/jin2025cvpr-oneshot/}
}