Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Abstract
We present \underline \text S tabl\underline \text e \underline \text V irtual C\underline \text a mera (Seva), a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras.Existing works struggle to generate either large viewpoint changes or temporally smooth samples, while relying on specific task configurations.Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy that generalize across view synthesis tasks at test time.As a result, our samples maintain high consistency without requiring additional 3D representation-based distillation, thus streamlining view synthesis in the wild.Furthermore, we show that our method can generate high-quality videos lasting up to half a minute with seamless loop closure.Extensive benchmarking demonstrates that Seva outperforms existing methods across different datasets and settings.
Cite
Text
Zhou et al. "Stable Virtual Camera: Generative View Synthesis with Diffusion Models." International Conference on Computer Vision, 2025.Markdown
[Zhou et al. "Stable Virtual Camera: Generative View Synthesis with Diffusion Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhou2025iccv-stable/)BibTeX
@inproceedings{zhou2025iccv-stable,
title = {{Stable Virtual Camera: Generative View Synthesis with Diffusion Models}},
author = {Zhou, Jensen and Gao, Hang and Voleti, Vikram and Vasishta, Aaryaman and Yao, Chun-Han and Boss, Mark and Torr, Philip and Rupprecht, Christian and Jampani, Varun},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {12405-12414},
url = {https://mlanthology.org/iccv/2025/zhou2025iccv-stable/}
}