Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

Abstract

Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally trained models without sharing any client's training data. It has been observed in recent works that learning in a federated manner may lead the aggregated global model to converge to a 'sharp minimum' thereby adversely affecting the generalizability of this FL-trained model. Therefore, in this work, we aim to improve the generalization performance of models trained in a federated setup by introducing a 'flatness' constrained FL optimization problem. This flatness constraint is imposed on the top eigenvalue of the Hessian computed from the training loss. %of the global model. As each client trains a model on its local data, we further re-formulate this complex problem utilizing the client loss functions and propose a new computationally efficient regularization technique, dubbed 'MAN' which Minimizes Activation's Norm of each layer on client-side models. We also theoretically show that minimizing the activation norm reduces the top eigenvalue of the layer-wise Hessian of the client's loss, which in turn decreases the overall Hessian's top eigenvalue, ensuring convergence to a flat minimum. We apply our proposed flatness-constrained optimization to the existing FL techniques and obtain significant improvements, thereby establishing new state-of-the-art.

Cite

Text

Yashwanth et al. "Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Yashwanth et al. "Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/yashwanth2024wacv-minimizing/)

BibTeX

@inproceedings{yashwanth2024wacv-minimizing,
  title     = {{Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning}},
  author    = {Yashwanth, M. and Nayak, Gaurav Kumar and Rangwani, Harsh and Singh, Arya and Babu, R. Venkatesh and Chakraborty, Anirban},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {2287-2296},
  url       = {https://mlanthology.org/wacv/2024/yashwanth2024wacv-minimizing/}
}