Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning
Abstract
Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.
Cite
Text
Zhao et al. "Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Zhao et al. "Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhao2025icml-generalizable/)BibTeX
@inproceedings{zhao2025icml-generalizable,
title = {{Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning}},
author = {Zhao, Xue and Gu, Qinying and Wang, Xinbing and Zhou, Chenghu and Ye, Nanyang},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {77521-77538},
volume = {267},
url = {https://mlanthology.org/icml/2025/zhao2025icml-generalizable/}
}