Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Abstract
Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry. We attribute these geometric failures to the extreme viewpoint gap and sparse, inconsistent supervision inherent in satellite-to-street data. We introduce Sat3DGen to address these fundamental challenges, which embodies a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the primary sources of geometric error. This geometry-centric strategy yields a dramatic leap in both 3D accuracy and photorealism. For validation, we first constructed a new benchmark by pairing the VIGOR-OOD test set with high-resolution DSM data. On this benchmark, our method improves geometric RMSE from 6.76m to 5.20m. Crucially, this geometric leap also boosts photorealism, reducing the Fr\'echet Inception Distance (FID) from $\sim$40 to 19 against the leading method, Sat2Density++, despite using no extra tailored image-quality modules. We demonstrate the versatility of our high-quality 3D assets through diverse downstream applications, including semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image Digital Surface Model (DSM) estimation. The code has been released on \url{https://github.com/qianmingduowan/Sat3DGen}.
Cite
Text
Qian et al. "Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image." International Conference on Learning Representations, 2026.Markdown
[Qian et al. "Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/qian2026iclr-sat3dgen/)BibTeX
@inproceedings{qian2026iclr-sat3dgen,
title = {{Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image}},
author = {Qian, Ming and Xia, Zimin and Liu, Changkun and Ma, Shuailei and Wang, Wen and Ke, Zeran and Tan, Bin and Zhang, Hang and Xia, Gui-Song},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/qian2026iclr-sat3dgen/}
}