Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition

Abstract

Nowadays, naturalistic driving action recognition and computer vision techniques provide crucial solutions to identify and eliminate distracting driving behavior. Existing methods often extract features through fixed-size sliding windows and predict an action’s start and end time. However, the information about a fixed-size window may be incomplete or redundant and the connections between different windows are insufficient. To alleviate this problem, we propose a novel Augmented Self-Mask Attention (AMA) architecture that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order. We employ an ensemble technique and use a weighted boundaries fusion to combine and refine predictions with high confidence scores action boundaries. On the test dataset of AI City Challenge 2024 Track3, we achieved significant results compared with other teams, the proposed model ranks first on the public leaderboard of the challenge. Codes are available at https://github.com/wolfworld6/AIcity2024-track3.

Cite

Text

Zhang et al. "Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00705

Markdown

[Zhang et al. "Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/zhang2024cvprw-augmented/) doi:10.1109/CVPRW63382.2024.00705

BibTeX

@inproceedings{zhang2024cvprw-augmented,
  title     = {{Augmented Self-Mask Attention Transformer for Naturalistic Driving Action Recognition}},
  author    = {Zhang, Tiantian and Wang, Qingtian and Dong, Xiaodong and Yu, Wenqing and Sun, Hao and Zhou, Xuyang and Zhen, Aigong and Cui, Shun and Wu, Dong and He, Zhongjiang},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {7108-7114},
  doi       = {10.1109/CVPRW63382.2024.00705},
  url       = {https://mlanthology.org/cvprw/2024/zhang2024cvprw-augmented/}
}