Language-Agnostic Visual-Semantic Embeddings
Abstract
This paper proposes a framework for training language-invariant cross-modal retrieval models. We also introduce a novel character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Our contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. We show that our models outperform the current state-of-the-art in both single and multi-language scenarios. This work can be seen as the basis of a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios. Code is available at https://github.com/jwehrmann/lavse.
Cite
Text
Wehrmann et al. "Language-Agnostic Visual-Semantic Embeddings." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00590Markdown
[Wehrmann et al. "Language-Agnostic Visual-Semantic Embeddings." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/wehrmann2019iccv-languageagnostic/) doi:10.1109/ICCV.2019.00590BibTeX
@inproceedings{wehrmann2019iccv-languageagnostic,
title = {{Language-Agnostic Visual-Semantic Embeddings}},
author = {Wehrmann, Jonatas and Souza, Douglas M. and Lopes, Mauricio A. and Barros, Rodrigo C.},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
year = {2019},
doi = {10.1109/ICCV.2019.00590},
url = {https://mlanthology.org/iccv/2019/wehrmann2019iccv-languageagnostic/}
}