ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Abstract
3D asset generation is getting massive amounts of attention inspired by the recent success on text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data which often results in non-photorealistic 3D objects without backgrounds. In this paper we present a method that leverages pretrained text-to-image models as a prior and learn to generate multi-view images in a single denoising process from real-world data. Concretely we propose to integrate 3D volume-rendering and cross-frame-attention layers into each block of the existing U-Net network of the text-to-image model. Moreover we design an autoregressive generation that renders more 3D-consistent images at any viewpoint. We train our model on real-world datasets of objects and showcase its capabilities to generate instances with a variety of high-quality shapes and textures in authentic surroundings. Compared to the existing methods the results generated by our method are consistent and have favorable visual quality (-30% FID -37% KID).
Cite
Text
Höllein et al. "ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00482Markdown
[Höllein et al. "ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/hollein2024cvpr-viewdiff/) doi:10.1109/CVPR52733.2024.00482BibTeX
@inproceedings{hollein2024cvpr-viewdiff,
title = {{ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models}},
author = {Höllein, Lukas and Boži?, Aljaž and Müller, Norman and Novotny, David and Tseng, Hung-Yu and Richardt, Christian and Zollhöfer, Michael and Nießner, Matthias},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {5043-5052},
doi = {10.1109/CVPR52733.2024.00482},
url = {https://mlanthology.org/cvpr/2024/hollein2024cvpr-viewdiff/}
}