Learning Generalized Feature for Temporal Action Detection: Application for Natural Driving Action Recognition Challenge

Abstract

This paper reports our approach for the 2022 AI City Challenge - Naturalistic Driving Action Recognition (Track 3), where the objective is to detect when and what kinds of actions that a driver performs in a long, untrimmed video. Our solution is built upon the single stage ActionFormer detector, in which temporal location and classification are predicted simultaneously for efficiency. The input feature for the detector is extracted offline using our proposed backbone, which we named "ConvNext-Video". However, due to the small size of the dataset, training the model to avoid over-fitting becomes challenging. To address this problem, we focus on training techniques that can improve the generalization of underlying features. Specifically, we utilize two methods: "learning without forgetting" and semi-weak supervised learning on the unlabeled data A2. Finally, we also add a second-stage classifier (SSC) using our ConvNeXt-Video backbone. The SSC Classifer is designed to combine information from multi-clips and multi-view cameras to improve the prediction precision. Our best result achieves 29.1 F1 score on the public test set. Our source code is released at link.

Cite

Text

Nguyen et al. "Learning Generalized Feature for Temporal Action Detection: Application for Natural Driving Action Recognition Challenge." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00367

Markdown

[Nguyen et al. "Learning Generalized Feature for Temporal Action Detection: Application for Natural Driving Action Recognition Challenge." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/nguyen2022cvprw-learning/) doi:10.1109/CVPRW56347.2022.00367

BibTeX

@inproceedings{nguyen2022cvprw-learning,
  title     = {{Learning Generalized Feature for Temporal Action Detection: Application for Natural Driving Action Recognition Challenge}},
  author    = {Nguyen, Chuong and Nguyen, Ngoc and Huynh, Su and Nguyen, Vinh and Nguyen, Son},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {3248-3255},
  doi       = {10.1109/CVPRW56347.2022.00367},
  url       = {https://mlanthology.org/cvprw/2022/nguyen2022cvprw-learning/}
}