Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis

Abstract

Ground-to-aerial image synthesis focuses on generating realistic aerial images from corresponding ground street view images while maintaining consistent content layout, simulating a top-down view. The significant viewpoint difference leads to domain gaps between views, and dense urban scenes limit the visible range of street views, making this cross-view generation task particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view generation method for synthesizing aerial images from street view images, utilizing a diffusion model and the Bird's-Eye View (BEV) paradigm. The Curved-BEV method in SkyDiffusion converts street-view images into a BEV perspective, effectively bridging the domain gap, and employs a "multi-to-one" mapping strategy to address occlusion issues in dense urban scenes. Next, SkyDiffusion designed a BEV-guided diffusion model to generate content-consistent and realistic aerial images. Additionally, we introduce a novel dataset, Ground2Aerial-3, designed for diverse ground-to-aerial image synthesis applications, including disaster scene aerial synthesis, low-altitude UAV image synthesis, and historical high-resolution satellite image synthesis tasks. Experimental results demonstrate that SkyDiffusion outperforms state-of-the-art methods on cross-view datasets across natural (CVUSA), suburban (CVACT), urban (VIGOR-Chicago), and various application scenarios (G2A-3), achieving realistic and content-consistent aerial image generation. The code, datasets and more information of this work can be found at https://opendatalab.github.io/skydiffusion/.

Cite

Text

Ye et al. "Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis." International Conference on Computer Vision, 2025.

Markdown

[Ye et al. "Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/ye2025iccv-leveraging/)

BibTeX

@inproceedings{ye2025iccv-leveraging,
  title     = {{Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis}},
  author    = {Ye, Junyan and He, Jun and Li, Weijia and Lv, Zhutao and Lin, Yi and Yu, Jinhua and Yang, Haote and He, Conghui},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {28451-28461},
  url       = {https://mlanthology.org/iccv/2025/ye2025iccv-leveraging/}
}