Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable
Abstract
While camera-based 3D object detection has evolved rapidly, these models are susceptible to overfitting to specific sensor setups. For example, in autonomous driving, most datasets are collected using a single sensor configuration. This paper evaluates the generalization capability of camera-based 3D object detectors, including adapting detectors from one dataset to another and training detectors with multiple datasets. We observe that merely aggregating datasets yields drastic performance drops, contrary to the expected improvements associated with increased training data. To close the gap, we introduce an efficient technique for aligning disparate sensor configurations — a combination of camera intrinsic synchronization, camera extrinsic correction, and ego frame alignment, which collectively enhance cross-dataset performance remarkably. Compared with single dataset baselines, we achieve 42.3 mAP improvement on KITTI, 23.2 mAP improvement on Lyft, 18.5 mAP improvement on nuScenes, 17.3 mAP improvement on KITTI-360, 8.4 mAP improvement on Argoverse2 and 3.9 mAP improvement on Waymo. We hope this comprehensive study can facilitate research on generalizable 3D object detection and associated tasks.
Cite
Text
Zheng et al. "Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable." Conference on Robot Learning, 2023.Markdown
[Zheng et al. "Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/zheng2023corl-crossdataset/)BibTeX
@inproceedings{zheng2023corl-crossdataset,
title = {{Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable}},
author = {Zheng, Liangtao and Liu, Yicheng and Wang, Yue and Zhao, Hang},
booktitle = {Conference on Robot Learning},
year = {2023},
pages = {1903-1929},
volume = {229},
url = {https://mlanthology.org/corl/2023/zheng2023corl-crossdataset/}
}