Visual Attention Prompted Prediction and Learning

Abstract

Classical Federated Learning (FL) encounters significant challenges when deploying large models on power-constrained clients. To tackle this, we propose an asymmetric FL mechanism that enables the aggregation of compact client models into a comprehensive server model. We design the server model as a Mixture-of-Experts (MoE), where each expert has the same architecture as each client model. This uniformity allows for efficient fusion of the most pertinent client models to update each server expert, based on the measured relevance between each client and server expert. To address the Non-IID data issue, we further optimize the server-side MoE architecture by incorporating a main expert that always activates alongside a set of selectively activated routed experts. This configuration ensures a balance between learning general knowledge and specific data distribution. Our Fed-MoE framework is model-agnostic and has demonstrated notable improvements on vision FL tasks with million-scale ResNet backbones, and language tasks with billion-scale BERT and GPT-2 backbones.

Cite

Text

Zhang et al. "Visual Attention Prompted Prediction and Learning." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/610

Markdown

[Zhang et al. "Visual Attention Prompted Prediction and Learning." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/zhang2024ijcai-visual/) doi:10.24963/ijcai.2024/610

BibTeX

@inproceedings{zhang2024ijcai-visual,
  title     = {{Visual Attention Prompted Prediction and Learning}},
  author    = {Zhang, Yifei and Pan, Bo and Gu, Siyi and Bai, Guangji and Qiu, Meikang and Yang, Xiaofeng and Zhao, Liang},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {5517-5525},
  doi       = {10.24963/ijcai.2024/610},
  url       = {https://mlanthology.org/ijcai/2024/zhang2024ijcai-visual/}
}