FULLER: Unified Multi-Modality Multi-Task 3D Perception via Multi-Level Gradient Calibration

Abstract

Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario, considering robust prediction and computation budget. However, naively extending the existing framework to the domain of multi-modality multi-task learning remains ineffective and even poisonous due to the notorious modality bias and task conflict. Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima. To mitigate the issue, we propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization. Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict. Before the calibrated gradients are further propagated to the modality branches of the backbone, their magnitudes will be calibrated again to the same level, ensuring the downstream tasks pay balanced attention to different modalities. Experiments on large-scale benchmark nuScenes demonstrate the effectiveness of the proposed method, eg, an absolute 14.4% mIoU improvement on map segmentation and 1.4% mAP improvement on 3D detection, advancing the application of 3D autonomous driving in the domain of multi-modality fusion and multi-task learning. We also discuss the links between modalities and tasks.

Cite

Text

Huang et al. "FULLER: Unified Multi-Modality Multi-Task 3D Perception via Multi-Level Gradient Calibration." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00324

Markdown

[Huang et al. "FULLER: Unified Multi-Modality Multi-Task 3D Perception via Multi-Level Gradient Calibration." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/huang2023iccv-fuller/) doi:10.1109/ICCV51070.2023.00324

BibTeX

@inproceedings{huang2023iccv-fuller,
  title     = {{FULLER: Unified Multi-Modality Multi-Task 3D Perception via Multi-Level Gradient Calibration}},
  author    = {Huang, Zhijian and Lin, Sihao and Liu, Guiyu and Luo, Mukun and Ye, Chaoqiang and Xu, Hang and Chang, Xiaojun and Liang, Xiaodan},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {3502-3511},
  doi       = {10.1109/ICCV51070.2023.00324},
  url       = {https://mlanthology.org/iccv/2023/huang2023iccv-fuller/}
}