Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval
Abstract
Cross-modal hashing encodes the multimedia data into a common binary hash space in which the correlations among the samples from different modalities can be effectively measured. Deep cross-modal hashing further improves the retrieval performance as the deep neural networks can generate more semantic relevant features and hash codes. In this paper, we study the unsupervised deep cross-modal hash coding and propose Deep Joint-Semantics Reconstructing Hashing (DJSRH), which has the following two main advantages. First, to learn binary codes that preserve the neighborhood structure of the original data, DJSRH constructs a novel joint-semantics affinity matrix which elaborately integrates the original neighborhood information from different modalities and accordingly is capable to capture the latent intrinsic semantic affinity for the input multi-modal instances. Second, DJSRH later trains the networks to generate binary codes that maximally reconstruct above joint-semantics relations via the proposed reconstructing framework, which is more competent for the batch-wise training as it reconstructs the specific similarity value unlike the common Laplacian constraint merely preserving the similarity order. Extensive experiments demonstrate the significant improvement by DJSRH in various cross-modal retrieval tasks.
Cite
Text
Su et al. "Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00312Markdown
[Su et al. "Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/su2019iccv-deep/) doi:10.1109/ICCV.2019.00312BibTeX
@inproceedings{su2019iccv-deep,
title = {{Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval}},
author = {Su, Shupeng and Zhong, Zhisheng and Zhang, Chao},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
year = {2019},
doi = {10.1109/ICCV.2019.00312},
url = {https://mlanthology.org/iccv/2019/su2019iccv-deep/}
}