Enhancing 3D Fidelity of Text-to-3D Using Cross-View Correspondences

Abstract

Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency e.g. the Janus face problem or the content drift problem in zero-shot text-to-3D models. However the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic the underlying geometry may contain errors such as unreasonable concavities. In this work we propose CorrespondentDream an effective method to leverage annotation-free cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception and by adopting it in our loss design we are able to produce NeRF models with geometries that are more coherent with common sense e.g. more smoothed object surface yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study.

Cite

Text

Kim et al. "Enhancing 3D Fidelity of Text-to-3D Using Cross-View Correspondences." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01013

Markdown

[Kim et al. "Enhancing 3D Fidelity of Text-to-3D Using Cross-View Correspondences." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/kim2024cvpr-enhancing/) doi:10.1109/CVPR52733.2024.01013

BibTeX

@inproceedings{kim2024cvpr-enhancing,
  title     = {{Enhancing 3D Fidelity of Text-to-3D Using Cross-View Correspondences}},
  author    = {Kim, Seungwook and Li, Kejie and Deng, Xueqing and Shi, Yichun and Cho, Minsu and Wang, Peng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {10649-10658},
  doi       = {10.1109/CVPR52733.2024.01013},
  url       = {https://mlanthology.org/cvpr/2024/kim2024cvpr-enhancing/}
}