Improved Multimodal Deep Learning with Variation of Information

NeurIPS 2014 pp. 2141-2149

/neurips/2014/sohn2014neurips-improved/

Abstract

Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks. Nonetheless, there still remains a question how to learn a good association between data modalities; in particular, a good generative model of multimodal data should be able to reason about missing data modality given the rest of data modalities. In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. Rather than learning with maximum likelihood, we train the model to minimize the variation of information. We provide a theoretical insight why the proposed learning objective is sufficient to estimate the data-generating joint distribution of multimodal data. We apply our method to restricted Boltzmann machines and introduce learning methods based on contrastive divergence and multi-prediction training. In addition, we extend to deep networks with recurrent encoding structure to finetune the whole network. In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.

PDF NeurIPS Semantic Scholar

Cite

Text

Sohn et al. "Improved Multimodal Deep Learning with Variation of Information." Neural Information Processing Systems, 2014.

Markdown

[Sohn et al. "Improved Multimodal Deep Learning with Variation of Information." Neural Information Processing Systems, 2014.](https://mlanthology.org/neurips/2014/sohn2014neurips-improved/)

BibTeX

@inproceedings{sohn2014neurips-improved,
  title     = {{Improved Multimodal Deep Learning with Variation of Information}},
  author    = {Sohn, Kihyuk and Shang, Wenling and Lee, Honglak},
  booktitle = {Neural Information Processing Systems},
  year      = {2014},
  pages     = {2141-2149},
  url       = {https://mlanthology.org/neurips/2014/sohn2014neurips-improved/}
}