Robust Category-Level 3D Pose Estimation from Diffusion-Enhanced Synthetic Data

Abstract

Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and fully supervised models trained on a large amount of real data. We achieve this by approaching the problem from two perspectives: 1) We introduce P3D-Diffusion, a new synthetic dataset with accurate 3D annotations generated with a graphics-guided diffusion model. 2) We propose Cross-domain 3D Consistency, CC3D, for unsupervised domain adaptation of neural mesh models. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by a wide margin using only 50% of the real training data. By encouraging the diversity of synthetic data and generating the images with an OOD-aware manner, our model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data.

Cite

Text

Yang et al. "Robust Category-Level 3D Pose Estimation from Diffusion-Enhanced Synthetic Data." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Yang et al. "Robust Category-Level 3D Pose Estimation from Diffusion-Enhanced Synthetic Data." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/yang2024wacv-robust/)

BibTeX

@inproceedings{yang2024wacv-robust,
  title     = {{Robust Category-Level 3D Pose Estimation from Diffusion-Enhanced Synthetic Data}},
  author    = {Yang, Jiahao and Ma, Wufei and Wang, Angtian and Yuan, Xiaoding and Yuille, Alan and Kortylewski, Adam},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {3446-3455},
  url       = {https://mlanthology.org/wacv/2024/yang2024wacv-robust/}
}