What Makes Multi-Modal Learning Better than Single (Provably)

Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang

NeurIPS 2021

/neurips/2021/huang2021neurips-makes/

Abstract

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal?In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition is that the former has a more accurate estimate of the latent space representation. To the best of our knowledge, this is the first theoretical treatment to capture important qualitative phenomena observed in real multi-modal applications from the generalization perspective. Combining with experiment results, we show that multi-modal learning does possess an appealing formal guarantee.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Huang et al. "What Makes Multi-Modal Learning Better than Single (Provably)." Neural Information Processing Systems, 2021.

Markdown

[Huang et al. "What Makes Multi-Modal Learning Better than Single (Provably)." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/huang2021neurips-makes/)

BibTeX

@inproceedings{huang2021neurips-makes,
  title     = {{What Makes Multi-Modal Learning Better than Single (Provably)}},
  author    = {Huang, Yu and Du, Chenzhuang and Xue, Zihui and Chen, Xuanyao and Zhao, Hang and Huang, Longbo},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/huang2021neurips-makes/}
}