3D Congealing: 3D-Aware Image Alignment in the Wild

Abstract

We propose , a novel problem of 3D-aware alignment for 2D images capturing semantically similar objects. Given a collection of unlabeled Internet images, our goal is to associate the shared semantic parts from the inputs and aggregate the knowledge from 2D images to a shared 3D canonical space. We introduce a general framework that tackles the task without assuming shape templates, poses, or any camera parameters. At its core is a canonical 3D representation that encapsulates geometric and semantic information. The framework optimizes for the canonical representation together with the pose for each input image, and a per-image coordinate map that warps 2D pixel coordinates to the 3D canonical frame to account for the shape matching. The optimization procedure fuses prior knowledge from a pre-trained image generative model and semantic information from input images. The former provides strong knowledge guidance for this under-constraint task, while the latter provides the necessary information to mitigate the training data bias from the pre-trained model. Our framework can be used for various tasks such as pose estimation and image editing, achieving strong results on real-world image datasets under challenging illumination conditions and on in-the-wild online image collections. Project page at https://ai.stanford. edu/~yzzhang/projects/3d-congealing/.

Cite

Text

Zhang et al. "3D Congealing: 3D-Aware Image Alignment in the Wild." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73232-4_22

Markdown

[Zhang et al. "3D Congealing: 3D-Aware Image Alignment in the Wild." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-3d/) doi:10.1007/978-3-031-73232-4_22

BibTeX

@inproceedings{zhang2024eccv-3d,
  title     = {{3D Congealing: 3D-Aware Image Alignment in the Wild}},
  author    = {Zhang, Yunzhi and Li, Zizhang and Raj, Amit and Engelhardt, Andreas and Li, Yuanzhen and Hou, Tingbo and Wu, Jiajun and Jampani, Varun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73232-4_22},
  url       = {https://mlanthology.org/eccv/2024/zhang2024eccv-3d/}
}