MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Abstract

Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.

Cite

Text

Liu et al. "MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00644

Markdown

[Liu et al. "MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/liu2025cvpr-mmtluniad/) doi:10.1109/CVPR52734.2025.00644

BibTeX

@inproceedings{liu2025cvpr-mmtluniad,
  title     = {{MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception}},
  author    = {Liu, Wenzhuo and Wang, Wenshuo and Qiao, Yicheng and Guo, Qiannan and Zhu, Jiayin and Li, Pengfei and Chen, Zilong and Yang, Huiming and Li, Zhiwei and Wang, Lening and Tan, Tiao and Liu, Huaping},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {6864-6874},
  doi       = {10.1109/CVPR52734.2025.00644},
  url       = {https://mlanthology.org/cvpr/2025/liu2025cvpr-mmtluniad/}
}