M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification

Abstract

Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with the corresponding samples from other modalities. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains. It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.

Cite

Text

Kumar et al. "M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification." NeurIPS 2024 Workshops: UniReps, 2024.

Markdown

[Kumar et al. "M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification." NeurIPS 2024 Workshops: UniReps, 2024.](https://mlanthology.org/neuripsw/2024/kumar2024neuripsw-m3col/)

BibTeX

@inproceedings{kumar2024neuripsw-m3col,
  title     = {{M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification}},
  author    = {Kumar, Raja and Singhal, Raghav and Kulkarni, Pranamya Prashant and Mehta, Deval and Jadhav, Kshitij Sharad},
  booktitle = {NeurIPS 2024 Workshops: UniReps},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/kumar2024neuripsw-m3col/}
}