Multi View Action Recognition for Distracted Driver Behavior Localization

Abstract

This paper presents our approach for Track 3 (Naturalistic Driving Action Recognition) of the 2023 AI City Challenge, where the objective is to classify distracting driving activities in each untrimmed naturalistic driving video and localize the accurate temporal boundaries of them. Our solution relies on large model fine-tuning to train a base video recognition model on a small-scale video dataset. After that, we adopt multi-view multi-fold ensemble to produce fine-grained clip-level classification results. Given the recognition probabilities, a non-trivial clustering and removing post-processing algorithm is applied to generate final location proposals. Extensive experiments demonstrate that the proposed method achieves superior performance against other methods and rank the 1st on the Test-A2 of the challenge track.

Cite

Text

Zhou et al. "Multi View Action Recognition for Distracted Driver Behavior Localization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00567

Markdown

[Zhou et al. "Multi View Action Recognition for Distracted Driver Behavior Localization." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/zhou2023cvprw-multi/) doi:10.1109/CVPRW59228.2023.00567

BibTeX

@inproceedings{zhou2023cvprw-multi,
  title     = {{Multi View Action Recognition for Distracted Driver Behavior Localization}},
  author    = {Zhou, Wei and Qian, Yinlong and Jie, Zequn and Ma, Lin},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5375-5380},
  doi       = {10.1109/CVPRW59228.2023.00567},
  url       = {https://mlanthology.org/cvprw/2023/zhou2023cvprw-multi/}
}