SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion

Abstract

Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. To address this limitation, this paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer, exploring the potential of satellite-ground image pairs in the SSC task. Specifically, we propose a dual-branch architecture that encodes orthogonal satellite and ground views in parallel, unifying them into a common domain. Additionally, we design a ground-view guidance strategy that pre-corrects satellite image biases during feature encoding, addressing misalignment between satellite and ground views. Moreover, we develop an adaptive weighting strategy that balances contributions from satellite and ground views. Experiments demonstrate that SGFormer outperforms the state of the art on SemanticKITTI and SSCBench-KITTI-360 datasets. Our code is available on https://github.com/gxytcrc/SGFormer.

Cite

Text

Guo et al. "SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01114

Markdown

[Guo et al. "SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/guo2025cvpr-sgformer/) doi:10.1109/CVPR52734.2025.01114

BibTeX

@inproceedings{guo2025cvpr-sgformer,
  title     = {{SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion}},
  author    = {Guo, Xiyue and Hu, Jiarui and Hu, Junjie and Bao, Hujun and Zhang, Guofeng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {11929-11938},
  doi       = {10.1109/CVPR52734.2025.01114},
  url       = {https://mlanthology.org/cvpr/2025/guo2025cvpr-sgformer/}
}