G2D: Boosting Multimodal Learning with Gradient-Guided Distillation

Abstract

Multimodal learning aims to leverage information from diverse data modalities to achieve more comprehensive performance. However, conventional multimodal models often suffer from modality imbalance, where one or a few modalities dominate model optimization, leading to suboptimal feature representation and underutilization of weak modalities. To address this challenge, we introduce Gradient-Guided Distillation (G^ 2 D), a knowledge distillation framework that optimizes the multimodal model with a custom-built loss function that fuses both unimodal and multimodal objectives. G^ 2 D further incorporates a dynamic sequential modality prioritization (SMP) technique in the learning process to ensure each modality leads the learning process, avoiding the pitfall of stronger modalities overshadowing weaker ones. We validate G^ 2 D on multiple real-world datasets and show that G^ 2 D amplifies the significance of weak modalities while training and outperforms state-of-the-art methods in classification and regression tasks. Our code is available \href https://github.com/rAIson-Lab/G2D here .

Cite

Text

Rakib and Bagavathi. "G2D: Boosting Multimodal Learning with Gradient-Guided Distillation." International Conference on Computer Vision, 2025.

Markdown

[Rakib and Bagavathi. "G2D: Boosting Multimodal Learning with Gradient-Guided Distillation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/rakib2025iccv-g2d/)

BibTeX

@inproceedings{rakib2025iccv-g2d,
  title     = {{G2D: Boosting Multimodal Learning with Gradient-Guided Distillation}},
  author    = {Rakib, Mohammed and Bagavathi, Arunkumar},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {4059-4068},
  url       = {https://mlanthology.org/iccv/2025/rakib2025iccv-g2d/}
}