Language-Agnostic Visual-Semantic Embeddings

Abstract

This paper proposes a framework for training language-invariant cross-modal retrieval models. We also introduce a novel character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Our contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. We show that our models outperform the current state-of-the-art in both single and multi-language scenarios. This work can be seen as the basis of a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios. Code is available at https://github.com/jwehrmann/lavse.

Cite

Text

Wehrmann et al. "Language-Agnostic Visual-Semantic Embeddings." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00590

Markdown

[Wehrmann et al. "Language-Agnostic Visual-Semantic Embeddings." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/wehrmann2019iccv-languageagnostic/) doi:10.1109/ICCV.2019.00590

BibTeX

@inproceedings{wehrmann2019iccv-languageagnostic,
  title     = {{Language-Agnostic Visual-Semantic Embeddings}},
  author    = {Wehrmann, Jonatas and Souza, Douglas M. and Lopes, Mauricio A. and Barros, Rodrigo C.},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00590},
  url       = {https://mlanthology.org/iccv/2019/wehrmann2019iccv-languageagnostic/}
}