Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery

Abstract

We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D. This is a challenging problem due to two primary reasons. Firstly objects in urban aerial images exhibit substantial variations in size including buildings cars and roads which pose a significant challenge for accurate 2D segmentation. Secondly the 2D labels generated by existing segmentation methods suffer from the multi-view inconsistency problem especially in the case of aerial images where each image captures only a small portion of the entire scene. To overcome these limitations we first introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes by combining labels predicted from different altitudes harnessing the novel-view synthesis capabilities of NeRF. We then introduce a novel cross-view instance label grouping strategy based on the 3D scene representation to mitigate the multi-view inconsistency problem in the 2D instance labels. Furthermore we exploit multi-view reconstructed depth priors to improve the geometric quality of the reconstructed radiance field resulting in enhanced segmentation results. Experiments on multiple real-world urban-scale datasets demonstrate that our approach outperforms existing methods highlighting its effectiveness. The source code is available at https://github.com/zyqz97/Aerial_lifting.

Cite

Text

Zhang et al. "Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01993

Markdown

[Zhang et al. "Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-aerial/) doi:10.1109/CVPR52733.2024.01993

BibTeX

@inproceedings{zhang2024cvpr-aerial,
  title     = {{Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery}},
  author    = {Zhang, Yuqi and Chen, Guanying and Chen, Jiaxing and Cui, Shuguang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {21092-21103},
  doi       = {10.1109/CVPR52733.2024.01993},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-aerial/}
}