Cross-Modal Common Representation Learning by Hybrid Transfer Network
Abstract
DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.
Cite
Text
Huang et al. "Cross-Modal Common Representation Learning by Hybrid Transfer Network." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/263Markdown
[Huang et al. "Cross-Modal Common Representation Learning by Hybrid Transfer Network." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/huang2017ijcai-cross/) doi:10.24963/IJCAI.2017/263BibTeX
@inproceedings{huang2017ijcai-cross,
title = {{Cross-Modal Common Representation Learning by Hybrid Transfer Network}},
author = {Huang, Xin and Peng, Yuxin and Yuan, Mingkuan},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2017},
pages = {1893-1900},
doi = {10.24963/IJCAI.2017/263},
url = {https://mlanthology.org/ijcai/2017/huang2017ijcai-cross/}
}