Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning

Abstract

Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.

Cite

Text

Zhao et al. "Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhao et al. "Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhao2025icml-generalizable/)

BibTeX

@inproceedings{zhao2025icml-generalizable,
  title     = {{Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning}},
  author    = {Zhao, Xue and Gu, Qinying and Wang, Xinbing and Zhou, Chenghu and Ye, Nanyang},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {77521-77538},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhao2025icml-generalizable/}
}