MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving
Abstract
Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1-score of 0.3154 on A2 dataset.
Cite
Text
Li et al. "MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00366Markdown
[Li et al. "MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/li2022cvprw-mvtal/) doi:10.1109/CVPRW56347.2022.00366BibTeX
@inproceedings{li2022cvprw-mvtal,
title = {{MV-TAL: Mulit-View Temporal Action Localization in Naturalistic Driving}},
author = {Li, Wei and Chen, Shimin and Gu, Jianyang and Wang, Ning and Chen, Chen and Guo, Yandong},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {3241-3247},
doi = {10.1109/CVPRW56347.2022.00366},
url = {https://mlanthology.org/cvprw/2022/li2022cvprw-mvtal/}
}