MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Shin, Minjung; Cho, Hyunin; Go, Sooyeon; Kim, Jin-Hwa; Uh, Youngjung

MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion

Minjung Shin, Hyunin Cho, Sooyeon Go, Jin-Hwa Kim, Youngjung Uh

ICLR 2026

/iclr/2026/shin2026iclr-mvcustom/

Abstract

Multi-view generation with camera pose control and prompt-based customization are both essential elements for achieving controllable generative models. However, existing multi-view generation models do not support customization with geometric consistency, whereas customization models lack explicit viewpoint control, making them challenging to unify. Motivated by these gaps, we introduce a novel task, multi-view customization, which aims to jointly achieve multi-view camera pose control and customization. Due to the scarcity of training data in customization, existing multi-view generation models, which inherently rely on large-scale datasets, struggle to generalize to diverse prompts. To address this, we propose MVCustom, a novel diffusion-based framework explicitly designed to achieve both multi-view consistency and customization fidelity. In the training stage, MVCustom learns the subject's identity and geometry using a feature-field representation, incorporating the text-to-video diffusion backbone enhanced with dense spatio-temporal attention, which leverages temporal coherence for multi-view consistency. In the inference stage, we introduce two novel techniques: depth-aware feature rendering explicitly enforces geometric consistency, and consistent-aware latent completion ensures accurate perspective alignment of the customized subject and surrounding backgrounds. Extensive experiments demonstrate that MVCustom achieves the most balanced and consistent competitive performance across multi-view consistency, customization fidelity, demonstrating effective solution of multi-objective generation task.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Shin et al. "MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion." International Conference on Learning Representations, 2026.

Markdown

[Shin et al. "MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/shin2026iclr-mvcustom/)

BibTeX

@inproceedings{shin2026iclr-mvcustom,
  title     = {{MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion}},
  author    = {Shin, Minjung and Cho, Hyunin and Go, Sooyeon and Kim, Jin-Hwa and Uh, Youngjung},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/shin2026iclr-mvcustom/}
}