Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge
Abstract
This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution mainly focuses on four points: (1) To address the linguistic ambiguity in the language query, we leverage our proposed standardized version for text descriptions for the domain-adaptive training and post-processing stage. (2) Our baseline vehicle retrieval model utilizes CLIP to extract robust visual and textual feature representations to learn the unified cross-modal representations between textual and visual features. (3) Our proposed semi-supervised domain adaptive (SSDA) training method is leveraged to address the domain gap between the train and test set. (4) Finally, we propose a multi-contextual post-processing technique that prunes out the wrong results based on multi-contextual attributes information that effectively boosts the final retrieval results. Our proposed framework has yielded a competitive performance of 82.63% MRR accuracy on the test set, achieving 1st place in the competition. Codes will be available at https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP
Cite
Text
Le et al. "Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00583Markdown
[Le et al. "Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/le2023cvprw-trackedvehicle/) doi:10.1109/CVPRW59228.2023.00583BibTeX
@inproceedings{le2023cvprw-trackedvehicle,
title = {{Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge}},
author = {Le, Huy Dinh-Anh and Nguyen, Quang Qui-Vinh and Luu, Duc Trung and Chau, Truc Thi-Thanh and Chung, Nhat Minh and Ha, Synh Viet-Uyen},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {5511-5519},
doi = {10.1109/CVPRW59228.2023.00583},
url = {https://mlanthology.org/cvprw/2023/le2023cvprw-trackedvehicle/}
}