DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval

Abstract

Retrieving vehicles matching natural language descriptions from collections of videos is a novel and uniquely challenging task, requiring consideration not only of vehicle types and colors, but also of temporal relations, e.g., "A white crossover keeping straight behind a silver hatch-back." To perform this task, we propose Dual-path Temporal Matching Network (DUN). DUN uses a pre-trained CNN and GloVe to extract visual and text features, respectively, and GRUs to mine temporal relationships in videos and sentences. Furthermore, the proposed network can attain superior performance by including techniques such as re-ranking. With its simple structure, DUN achieved second place on the AI City Challenge 2021 Track 5. The codes are available at https://github.com/okzhili/AICITY2021_Track5_DUN.

Cite

Text

Sun et al. "DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00458

Markdown

[Sun et al. "DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/sun2021cvprw-dun/) doi:10.1109/CVPRW53098.2021.00458

BibTeX

@inproceedings{sun2021cvprw-dun,
  title     = {{DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval}},
  author    = {Sun, Ziruo and Liu, Xinfang and Bi, Xiaopeng and Nie, Xiushan and Yin, Yilong},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2021},
  pages     = {4061-4067},
  doi       = {10.1109/CVPRW53098.2021.00458},
  url       = {https://mlanthology.org/cvprw/2021/sun2021cvprw-dun/}
}