Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Abstract

The correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene’s lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have shown strong generative and inpainting capabilities, we find that current models do not sufficiently “understand” the scene shown in a single picture to generate consistent lighting effects (shadows, bright reflections, etc.) while preserving the identity and details of the composited object. We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process. Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes. Our physically based pipeline further enables automatic materials and tone-mapping refinement.

Cite

Text

Liang et al. "Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73030-6_25

Markdown

[Liang et al. "Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/liang2024eccv-photorealistic/) doi:10.1007/978-3-031-73030-6_25

BibTeX

@inproceedings{liang2024eccv-photorealistic,
  title     = {{Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering}},
  author    = {Liang, Ruofan and Gojcic, Zan and Nimier-David, Merlin and Acuna, David and Vijaykumar, Nandita and Fidler, Sanja and Wang, Zian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73030-6_25},
  url       = {https://mlanthology.org/eccv/2024/liang2024eccv-photorealistic/}
}