DiMViS: Diffusion-Based Multi-View Synthesis
Abstract
Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DiMViS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DiMViS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DiMViS's superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.
Cite
Text
Di Giacomo et al. "DiMViS: Diffusion-Based Multi-View Synthesis." ICML 2024 Workshops: SPIGM, 2024.Markdown
[Di Giacomo et al. "DiMViS: Diffusion-Based Multi-View Synthesis." ICML 2024 Workshops: SPIGM, 2024.](https://mlanthology.org/icmlw/2024/giacomo2024icmlw-dimvis/)BibTeX
@inproceedings{giacomo2024icmlw-dimvis,
title = {{DiMViS: Diffusion-Based Multi-View Synthesis}},
author = {Di Giacomo, Giuseppe and Franzese, Giulio and Cerquitelli, Tania and Chiasserini, Carla Fabiana and Michiardi, Pietro},
booktitle = {ICML 2024 Workshops: SPIGM},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/giacomo2024icmlw-dimvis/}
}