Cross-Modal Deep Variational Hashing

Abstract

In this paper, we propose a cross-modal deep variational hashing (CMDVH) method to learn compact binary codes for cross-modality multimedia retrieval. Unlike most existing cross-modal hashing methods which learn a single pair of projections to map each example into a binary vector, we design a deep fusion neural network to learn non-linear transformations from image-text input pairs, such that a unified binary code is achieved in a discrete and discriminative manner using a classification-based hinge-loss criterion. We then design modality-specific neural networks in a probabilistic manner such that we model a latent variable to be close as possible from the inferred binary codes, at the same time approximated by a posterior distribution regularized by a known prior, which is suitable for out-of-sample extension. Experimental results on three benchmark datasets show the efficacy of the proposed approach.

Cite

Text

Liong et al. "Cross-Modal Deep Variational Hashing." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.439

Markdown

[Liong et al. "Cross-Modal Deep Variational Hashing." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/liong2017iccv-crossmodal/) doi:10.1109/ICCV.2017.439

BibTeX

@inproceedings{liong2017iccv-crossmodal,
  title     = {{Cross-Modal Deep Variational Hashing}},
  author    = {Liong, Venice Erin and Lu, Jiwen and Tan, Yap-Peng and Zhou, Jie},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.439},
  url       = {https://mlanthology.org/iccv/2017/liong2017iccv-crossmodal/}
}