Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos
Abstract
The task of Naturalistic Driving Action Recognition aims to detect and temporally localize distracting driving behavior in untrimmed videos. In this paper, we introduce our framework for Track 3 of the 8th AI City Challenge in 2024. The approach is primarily based on large model fine-tuning and ensemble techniques to train a set of action recognition models on a small-scale dataset. Starting with raw videos, we segment them into individual action sequences based on their annotation. We then fine-tune four different action recognition models, with K-fold cross-validation applied to the segmented data. Following this, we execute a multi-view ensemble, selecting the most visible camera views for each action class to generate clip-level classification results for each video. Finally, a multi-step post-processing algorithm, which is designed for the AI City Challenge dataset’s specific features, is employed to perform temporal action localization and produce temporal segments for the actions. Our solution achieves a final mOS score of 0.7798 and attains the 5th rank on the public leaderboard for the test set A2 of the challenge. The source code will be publicly available at https://github.com/SKKUAutoLab/AIC24-Track03.
Cite
Text
Nguyen et al. "Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00709Markdown
[Nguyen et al. "Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/nguyen2024cvprw-multiview/) doi:10.1109/CVPRW63382.2024.00709BibTeX
@inproceedings{nguyen2024cvprw-multiview,
title = {{Multi-View Spatial-Temporal Learning for Understanding Unusual Behaviors in Untrimmed Naturalistic Driving Videos}},
author = {Nguyen, Huy-Hung and Tran, Chi Dai and Pham, Long Hoang and Tran, Duong Nguyen-Ngoc and Tran, Tai Huu-Phuong and Vu, Duong Khac and Ho, Quoc Pham-Nam and Huynh, Ngoc Doan-Minh and Jeon, Hyung-Min and Jeon, Hyung-Joon and Jeon, Jae Wook},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {7144-7152},
doi = {10.1109/CVPRW63382.2024.00709},
url = {https://mlanthology.org/cvprw/2024/nguyen2024cvprw-multiview/}
}