SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Abstract

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional pipeline steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially-varying PBR parameters and surface normals jointly with each generated RGB view based on explicit camera control. This unique setup allows for direct relighting in a 2.5D setting, and for generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse image inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

Cite

Text

Engelhardt et al. "SViM3D: Stable Video Material Diffusion for Single Image 3D Generation." International Conference on Computer Vision, 2025.

Markdown

[Engelhardt et al. "SViM3D: Stable Video Material Diffusion for Single Image 3D Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/engelhardt2025iccv-svim3d/)

BibTeX

@inproceedings{engelhardt2025iccv-svim3d,
  title     = {{SViM3D: Stable Video Material Diffusion for Single Image 3D Generation}},
  author    = {Engelhardt, Andreas and Boss, Mark and Voleti, Vikram and Yao, Chun-Han and Lensch, Hendrik P. A. and Jampani, Varun},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {28428-28439},
  url       = {https://mlanthology.org/iccv/2025/engelhardt2025iccv-svim3d/}
}