Unbiased Missing-Modality Multimodal Learning
Abstract
Recovering missing modalities in multimodal learning has recently been approached using diffusion models to synthesize absent data conditioned on available modalities. However, existing methods often suffer from modality generation bias: while certain modalities are generated with high fidelity, others--such as video--remain challenging due to intrinsic modality gaps, leading to imbalanced training. To address this issue, we propose MD^2N (Multi-stage Duplex Diffusion Network), a novel framework for unbiased missing-modality recovery. MD^2N introduces a modality transfer module within a duplex diffusion architecture, enabling bidirectional generation between available and missing modalities through three stages: (1) global structure generation, (2) modality transfer, and (3) local cross-modal refinement. By training with duplex diffusion, both available and missing modalities generate each other in an intersecting manner, effectively achieving a balanced generation state.Extensive experiments demonstrate that MD^2N significantly outperforms existing state-of-the-art methods, achieving up to 4% improvement over IMDer on the CMU-MOSEI dataset. Project page: https://crystal-punk.github.io/.
Cite
Text
Dai et al. "Unbiased Missing-Modality Multimodal Learning." International Conference on Computer Vision, 2025.Markdown
[Dai et al. "Unbiased Missing-Modality Multimodal Learning." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/dai2025iccv-unbiased/)BibTeX
@inproceedings{dai2025iccv-unbiased,
title = {{Unbiased Missing-Modality Multimodal Learning}},
author = {Dai, Ruiting and Li, Chenxi and Yan, Yandong and Mo, Lisi and Qin, Ke and He, Tao},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {24507-24517},
url = {https://mlanthology.org/iccv/2025/dai2025iccv-unbiased/}
}