M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Abstract
Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities. However, real-world data often exhibits shared relations beyond simple pairwise associations. We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data. Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with the corresponding samples from other modalities. For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss. Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains. It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.
Cite
Text
Kumar et al. "M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification." NeurIPS 2024 Workshops: UniReps, 2024.Markdown
[Kumar et al. "M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification." NeurIPS 2024 Workshops: UniReps, 2024.](https://mlanthology.org/neuripsw/2024/kumar2024neuripsw-m3col/)BibTeX
@inproceedings{kumar2024neuripsw-m3col,
title = {{M3CoL: Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification}},
author = {Kumar, Raja and Singhal, Raghav and Kulkarni, Pranamya Prashant and Mehta, Deval and Jadhav, Kshitij Sharad},
booktitle = {NeurIPS 2024 Workshops: UniReps},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/kumar2024neuripsw-m3col/}
}