LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Jin, Haian; Jiang, Hanwen; Tan, Hao; Zhang, Kai; Bi, Sai; Zhang, Tianyuan; Luan, Fujun; Snavely, Noah; Xu, Zexiang

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, Zexiang Xu

ICLR 2025

/iclr/2025/jin2025iclr-lvsm/

Abstract

We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods---from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps)---addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality, delivering superior performance even with reduced computational resources (1-2 GPUs). Please see our anonymous website for more details: https://haian-jin.github.io/projects/LVSM/

PDF ICLR Semantic Scholar

Cite

Text

Jin et al. "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias." International Conference on Learning Representations, 2025.

Markdown

[Jin et al. "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/jin2025iclr-lvsm/)

BibTeX

@inproceedings{jin2025iclr-lvsm,
  title     = {{LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias}},
  author    = {Jin, Haian and Jiang, Hanwen and Tan, Hao and Zhang, Kai and Bi, Sai and Zhang, Tianyuan and Luan, Fujun and Snavely, Noah and Xu, Zexiang},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/jin2025iclr-lvsm/}
}