All You Can Embed: Natural Language Based Vehicle Retrieval with Spatio-Temporal Transformers

Abstract

Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.

Cite

Text

Scribano et al. "All You Can Embed: Natural Language Based Vehicle Retrieval with Spatio-Temporal Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00481

Markdown

[Scribano et al. "All You Can Embed: Natural Language Based Vehicle Retrieval with Spatio-Temporal Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/scribano2021cvprw-all/) doi:10.1109/CVPRW53098.2021.00481

BibTeX

@inproceedings{scribano2021cvprw-all,
  title     = {{All You Can Embed: Natural Language Based Vehicle Retrieval with Spatio-Temporal Transformers}},
  author    = {Scribano, Carmelo and Sapienza, Davide and Franchini, Giorgia and Verucchi, Micaela and Bertogna, Marko},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2021},
  pages     = {4253-4262},
  doi       = {10.1109/CVPRW53098.2021.00481},
  url       = {https://mlanthology.org/cvprw/2021/scribano2021cvprw-all/}
}