Multimodal Fusion Using Multi-View Domains for Data Heterogeneity in Federated Learning

Abstract

Multimodal information plays an important role in the advanced Internet of Things (IoT) in the era of 6G, which provides reliable and comprehensive assistance for downstream tasks through further fusion and analysis via federated learning (FL). One of the primary challenges in FL is data heterogeneity, which may lead to domain shifts and sharply different local long-tailed category distribution across nodes. These issues hinder the large-scale deployment of FL in IoT applications equipped with multiple various multimodal sensors due to performance deterioration. In this paper, we propose a novel multimodal fusion framework to tackle the aforementioned coupled problems arising during the cooperative fusion of multimodal information without privacy exposure among decentralized nodes equipped with diverse sensors. Specifically, we introduce a flexible global logit alignment (GLA) method based on multi-view domains. This method enables the fusion of diverse multimodal information with the consideration of domain shifts caused by modality-based data heterogeneity. Furthermore, we propose a novel local angular margin (LAM) scheme, which dynamically adjusts decision boundaries for locally seen categories while preserving global decision boundaries for unseen categories. This effectively mitigates severe model divergence caused by significantly different category distributions. Extensive simulations demonstrate the superiority of the proposed framework, which exhibits significant merits in tackling model degeneration caused by data heterogeneity and enhancing modality-based generalization for heterogeneous scenarios.

Cite

Text

Gao et al. "Multimodal Fusion Using Multi-View Domains for Data Heterogeneity in Federated Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I16.33839

Markdown

[Gao et al. "Multimodal Fusion Using Multi-View Domains for Data Heterogeneity in Federated Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/gao2025aaai-multimodal/) doi:10.1609/AAAI.V39I16.33839

BibTeX

@inproceedings{gao2025aaai-multimodal,
  title     = {{Multimodal Fusion Using Multi-View Domains for Data Heterogeneity in Federated Learning}},
  author    = {Gao, Min and Zheng, Haifeng and Feng, Xinxin and Tao, Ran},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {16736-16744},
  doi       = {10.1609/AAAI.V39I16.33839},
  url       = {https://mlanthology.org/aaai/2025/gao2025aaai-multimodal/}
}