SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Abstract
We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional pipeline steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially-varying PBR parameters and surface normals jointly with each generated RGB view based on explicit camera control. This unique setup allows for direct relighting in a 2.5D setting, and for generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse image inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.
Cite
Text
Engelhardt et al. "SViM3D: Stable Video Material Diffusion for Single Image 3D Generation." International Conference on Computer Vision, 2025.Markdown
[Engelhardt et al. "SViM3D: Stable Video Material Diffusion for Single Image 3D Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/engelhardt2025iccv-svim3d/)BibTeX
@inproceedings{engelhardt2025iccv-svim3d,
title = {{SViM3D: Stable Video Material Diffusion for Single Image 3D Generation}},
author = {Engelhardt, Andreas and Boss, Mark and Voleti, Vikram and Yao, Chun-Han and Lensch, Hendrik P. A. and Jampani, Varun},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {28428-28439},
url = {https://mlanthology.org/iccv/2025/engelhardt2025iccv-svim3d/}
}