The Non-IID Data Quagmire of Decentralized Machine Learning

Abstract

Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

Cite

Text

Hsieh et al. "The Non-IID Data Quagmire of Decentralized Machine Learning." International Conference on Machine Learning, 2020.

Markdown

[Hsieh et al. "The Non-IID Data Quagmire of Decentralized Machine Learning." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/hsieh2020icml-noniid/)

BibTeX

@inproceedings{hsieh2020icml-noniid,
  title     = {{The Non-IID Data Quagmire of Decentralized Machine Learning}},
  author    = {Hsieh, Kevin and Phanishayee, Amar and Mutlu, Onur and Gibbons, Phillip},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {4387-4398},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/hsieh2020icml-noniid/}
}