Towards Cycle-Consistent Models for Text and Image Retrieval
Abstract
Cross-modal retrieval has been recently becoming an hot-spot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn a joint multi-modal embedding space in which text and images could be projected and compared. Here we investigate a different approach, and reformulate the problem of cross-modal retrieval as that of learning a translation between the textual and visual domain. In particular, we propose an end-to-end trainable model which can translate text into image features and vice versa, and regularizes this mapping with a cycle-consistency criterion. Preliminary experimental evaluations show promising results with respect to ordinary visual-semantic models.
Cite
Text
Cornia et al. "Towards Cycle-Consistent Models for Text and Image Retrieval." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_58Markdown
[Cornia et al. "Towards Cycle-Consistent Models for Text and Image Retrieval." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/cornia2018eccvw-cycleconsistent/) doi:10.1007/978-3-030-11018-5_58BibTeX
@inproceedings{cornia2018eccvw-cycleconsistent,
title = {{Towards Cycle-Consistent Models for Text and Image Retrieval}},
author = {Cornia, Marcella and Baraldi, Lorenzo and Tavakoli, Hamed R. and Cucchiara, Rita},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {687-691},
doi = {10.1007/978-3-030-11018-5_58},
url = {https://mlanthology.org/eccvw/2018/cornia2018eccvw-cycleconsistent/}
}