StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
Abstract
The demand for stereo images increases as manufacturers launch more extended reality (XR) devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is training-free and straightforward to use with seamless integration into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight method for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layer Modifications to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
Cite
Text
Wang et al. "StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00737Markdown
[Wang et al. "StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/wang2024cvprw-stereodiffusion/) doi:10.1109/CVPRW63382.2024.00737BibTeX
@inproceedings{wang2024cvprw-stereodiffusion,
title = {{StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models}},
author = {Wang, Lezhong and Frisvad, Jeppe Revall and Jensen, Mark Bo and Bigdeli, Siavash Arjomand},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {7416-7425},
doi = {10.1109/CVPRW63382.2024.00737},
url = {https://mlanthology.org/cvprw/2024/wang2024cvprw-stereodiffusion/}
}