ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion

Abstract

Given a single image of a 3D object this paper proposes a novel method (named ConsistNet) that can generate multiple images of the same object as if they are captured from different viewpoints while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited. Central to our method is a lightweight multi-view consistency block that enables information exchange across multiple single-view diffusion processes based on the underlying multi-view geometry principles. ConsistNet is an extension to the standard latent diffusion model and it consists of two submodules: (a) a view aggregation module that unprojects multi-view features into global 3D volumes and infers consistency and (b) a ray aggregation module that samples and aggregates 3D consistent features back to each view to enforce consistency. Our approach departs from previous methods in multi-view image generation in that it can be easily dropped in pre-trained LDMs without requiring explicit pixel correspondences or depth prediction. Experiments show that our method effectively learns 3D consistency over a frozen Zero123-XL backbone and can generate 16 surrounding views of the object within 11 seconds on a single A100 GPU.

Cite

Text

Yang et al. "ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00676

Markdown

[Yang et al. "ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/yang2024cvpr-consistnet/) doi:10.1109/CVPR52733.2024.00676

BibTeX

@inproceedings{yang2024cvpr-consistnet,
  title     = {{ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion}},
  author    = {Yang, Jiayu and Cheng, Ziang and Duan, Yunfei and Ji, Pan and Li, Hongdong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7079-7088},
  doi       = {10.1109/CVPR52733.2024.00676},
  url       = {https://mlanthology.org/cvpr/2024/yang2024cvpr-consistnet/}
}