Zero-Shot Depth Aware Image Editing with Diffusion Models

Abstract

Diffusion models have transformed image editing but struggle with precise depth-aware control, such as placing objects at a specified depth. Layered representations offer fine-grained control by decomposing an image into separate editable layers. However, existing methods simplistically represent a scene via a set of background and transparent foreground layers while ignoring the scene geometry - limiting their effectiveness for depth-aware editing. We propose Depth-Guided Layer Decomposition - a layering method that decomposes an image into foreground and background layers based on a user-specified depth value, enabling precise depth-aware edits. We further propose Feature Guided Layer Compositing - a zero-shot approach for realistic layer compositing by leveraging generative priors from pretrained diffusion models. Specifically, we guide the internal U-Net features to progressively fuse individual layers into a composite latent at each denoising step. This preserves the structure of individual layers while generating realistic outputs with appropriate color and lighting adjustments without a need for post-hoc harmonization models. We demonstrate our method on two key depth-aware editing tasks: 1) scene compositing by blending the foreground of one scene with the background of another at a specified depth, and; 2) object insertion at a user-defined depth. Our zero-shot approach achieves precise depth ordering and high-quality edits, surpassing specialized scene compositing and object placement baselines, as validated across benchmarks and user studies.

Cite

Text

Parihar et al. "Zero-Shot Depth Aware Image Editing with Diffusion Models." International Conference on Computer Vision, 2025.

Markdown

[Parihar et al. "Zero-Shot Depth Aware Image Editing with Diffusion Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/parihar2025iccv-zeroshot/)

BibTeX

@inproceedings{parihar2025iccv-zeroshot,
  title     = {{Zero-Shot Depth Aware Image Editing with Diffusion Models}},
  author    = {Parihar, Rishubh and Vs, Sachidanand and Babu, R. Venkatesh},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {15748-15759},
  url       = {https://mlanthology.org/iccv/2025/parihar2025iccv-zeroshot/}
}