Consistent View Synthesis with Pose-Guided Diffusion Models

Abstract

Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches. More qualitative results are available at https://poseguided-diffusion.github.io/.

Cite

Text

Tseng et al. "Consistent View Synthesis with Pose-Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01609

Markdown

[Tseng et al. "Consistent View Synthesis with Pose-Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tseng2023cvpr-consistent/) doi:10.1109/CVPR52729.2023.01609

BibTeX

@inproceedings{tseng2023cvpr-consistent,
  title     = {{Consistent View Synthesis with Pose-Guided Diffusion Models}},
  author    = {Tseng, Hung-Yu and Li, Qinbo and Kim, Changil and Alsisan, Suhib and Huang, Jia-Bin and Kopf, Johannes},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {16773-16783},
  doi       = {10.1109/CVPR52729.2023.01609},
  url       = {https://mlanthology.org/cvpr/2023/tseng2023cvpr-consistent/}
}