Sharp Monocular View Synthesis in Less than a Second

Abstract

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25–34% and DISTS by 21–43% versus the best prior model, while lowering the synthesis time by three orders of magnitude. Code and weights are provided at https://github.com/apple/ml-sharp.

Cite

Text

Mescheder et al. "Sharp Monocular View Synthesis in Less than a Second." International Conference on Learning Representations, 2026.

Markdown

[Mescheder et al. "Sharp Monocular View Synthesis in Less than a Second." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mescheder2026iclr-sharp/)

BibTeX

@inproceedings{mescheder2026iclr-sharp,
  title     = {{Sharp Monocular View Synthesis in Less than a Second}},
  author    = {Mescheder, Lars and Dong, Wei and Li, Shiwei and Bai, Xuyang and Santos, Marcel and Hu, Peiyun and Lecouat, Bruno and Zhen, Mingmin and Delaunoy, Amaël and Fang, Tian and Tsin, Yanghai and Richter, Stephan and Koltun, Vladlen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/mescheder2026iclr-sharp/}
}