Multi-Perspective Traffic Video Description Model with Fine-Grained Refinement Approach
Abstract
Analyzing traffic patterns is crucial for enhancing safety and optimizing flow within urban cities. While urban cities possess extensive camera networks for monitoring, the raw video data often lacks the contextual detail necessary for understanding complex traffic incidents and the behaviors of road users. In this paper, we propose a novel methodology for generating comprehensive descriptions of traffic scenarios, combining a vision-language model with rule-based refinements to capture pertinently pedestrian, vehicle, and environment factors. First, a captioning model will generate a general description using processed video as input. Subsequently, this description is refined sequentially through three primary modules: pedestrian-aware, vehicle-aware, and context-aware, enhancing the final description. We evaluate our method on the Woven Traffic Safety datasets in Track 2 of the AI City Challenge 2024, obtaining competitive results with an S2 score of 22.6721. Code will be available at https://github.com/ToTuanAn/AICityChallenge2024_Track2
Cite
Text
To et al. "Multi-Perspective Traffic Video Description Model with Fine-Grained Refinement Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00701Markdown
[To et al. "Multi-Perspective Traffic Video Description Model with Fine-Grained Refinement Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/to2024cvprw-multiperspective/) doi:10.1109/CVPRW63382.2024.00701BibTeX
@inproceedings{to2024cvprw-multiperspective,
title = {{Multi-Perspective Traffic Video Description Model with Fine-Grained Refinement Approach}},
author = {To, Tuan-An and Tran, Minh-Nam and Ho, Trong-Bao and Ha, Thien-Loc and Nguyen, Quang-Tan and Luong, Hoang-Chau and Cao, Thanh-Duy and Tran, Minh-Triet},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {7075-7084},
doi = {10.1109/CVPRW63382.2024.00701},
url = {https://mlanthology.org/cvprw/2024/to2024cvprw-multiperspective/}
}