Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Qian, Ming; Xia, Zimin; Liu, Changkun; Ma, Shuailei; Wang, Wen; Ke, Zeran; Tan, Bin; Zhang, Hang; Xia, Gui-Song

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Ming Qian, Zimin Xia, Changkun Liu, Shuailei Ma, Wen Wang, Zeran Ke, Bin Tan, Hang Zhang, Gui-Song Xia

ICLR 2026

/iclr/2026/qian2026iclr-sat3dgen/

Abstract

Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry. We attribute these geometric failures to the extreme viewpoint gap and sparse, inconsistent supervision inherent in satellite-to-street data. We introduce Sat3DGen to address these fundamental challenges, which embodies a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the primary sources of geometric error. This geometry-centric strategy yields a dramatic leap in both 3D accuracy and photorealism. For validation, we first constructed a new benchmark by pairing the VIGOR-OOD test set with high-resolution DSM data. On this benchmark, our method improves geometric RMSE from 6.76m to 5.20m. Crucially, this geometric leap also boosts photorealism, reducing the Fr\'echet Inception Distance (FID) from $\sim$40 to 19 against the leading method, Sat2Density++, despite using no extra tailored image-quality modules. We demonstrate the versatility of our high-quality 3D assets through diverse downstream applications, including semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image Digital Surface Model (DSM) estimation. The code has been released on \url{https://github.com/qianmingduowan/Sat3DGen}.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Qian et al. "Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image." International Conference on Learning Representations, 2026.

Markdown

[Qian et al. "Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/qian2026iclr-sat3dgen/)

BibTeX

@inproceedings{qian2026iclr-sat3dgen,
  title     = {{Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image}},
  author    = {Qian, Ming and Xia, Zimin and Liu, Changkun and Ma, Shuailei and Wang, Wen and Ke, Zeran and Tan, Bin and Zhang, Hang and Xia, Gui-Song},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/qian2026iclr-sat3dgen/}
}