Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Abstract

Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performance on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.

Cite

Text

Mai et al. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I01.5347

Markdown

[Mai et al. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/mai2020aaai-modality/) doi:10.1609/AAAI.V34I01.5347

BibTeX

@inproceedings{mai2020aaai-modality,
  title     = {{Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion}},
  author    = {Mai, Sijie and Hu, Haifeng and Xing, Songlong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {164-172},
  doi       = {10.1609/AAAI.V34I01.5347},
  url       = {https://mlanthology.org/aaai/2020/mai2020aaai-modality/}
}