MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving

Abstract

Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1-score of 0.3154 on A2 dataset.

Cite

Text

Li et al. "MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00366

Markdown

[Li et al. "MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/li2022cvprw-mvtal/) doi:10.1109/CVPRW56347.2022.00366

BibTeX

@inproceedings{li2022cvprw-mvtal,
  title     = {{MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving}},
  author    = {Li, Wei and Chen, Shimin and Gu, Jianyang and Wang, Ning and Chen, Chen and Guo, Yandong},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {3241-3247},
  doi       = {10.1109/CVPRW56347.2022.00366},
  url       = {https://mlanthology.org/cvprw/2022/li2022cvprw-mvtal/}
}