An Optimal Transport-Based Latent Mixer for Robust Multi-Modal Learning

Abstract

Multi-modal learning aims to learn predictive models based on the data from different modalities. However, due to the requirement of data security and privacy protection, real-world multi-modal data are often scattered to different agents and cannot be shared across the agents, which limits the application of existing multi-modal learning methods. To achieve robust multi-modal learning in such a challenging scenario, we propose a novel optimal transport-based mixer (OTM), which works as an effective latent code alignment and augmentation method for unaligned and distributed multi-modal data. In particular, we train a Wasserstein autoencoder (WAE) for each agent, which encodes its single modal samples in a latent space. Through a central server, the proposed OTM computes a stochastic fused Gromov-Wasserstein barycenter (FGWB) to mix different modalities' latent codes, so that each agent applies the barycenter to reconstruct its samples. This method neither requires well-aligned multi-modal data nor assumes the data to share the same latent distribution, and each agent can learn a specific model based on multi-modal data while achieving inference based on its local modality. Experiments on multi-modal clustering and classification demonstrate that the models learned with the OTM method outperform the corresponding baselines.

Cite

Text

Gong et al. "An Optimal Transport-Based Latent Mixer for Robust Multi-Modal Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I16.33849

Markdown

[Gong et al. "An Optimal Transport-Based Latent Mixer for Robust Multi-Modal Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/gong2025aaai-optimal/) doi:10.1609/AAAI.V39I16.33849

BibTeX

@inproceedings{gong2025aaai-optimal,
  title     = {{An Optimal Transport-Based Latent Mixer for Robust Multi-Modal Learning}},
  author    = {Gong, Fengjiao and Yue, Angxiao and Xu, Hongteng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {16826-16834},
  doi       = {10.1609/AAAI.V39I16.33849},
  url       = {https://mlanthology.org/aaai/2025/gong2025aaai-optimal/}
}