Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval
Abstract
Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval
Cite
Text
Yang et al. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I07.6949Markdown
[Yang et al. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/yang2020aaai-mining/) doi:10.1609/AAAI.V34I07.6949BibTeX
@inproceedings{yang2020aaai-mining,
title = {{Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval}},
author = {Yang, Fan and Wang, Zheng and Xiao, Jing and Satoh, Shin'ichi},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {12589-12596},
doi = {10.1609/AAAI.V34I07.6949},
url = {https://mlanthology.org/aaai/2020/yang2020aaai-mining/}
}