Lo-Fi: Distributed Fine-Tuning Without Communication

Abstract

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node fine-tunes independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

Cite

Text

Wortsman et al. "Lo-Fi: Distributed Fine-Tuning Without Communication." Transactions on Machine Learning Research, 2023.

Markdown

[Wortsman et al. "Lo-Fi: Distributed Fine-Tuning Without Communication." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/wortsman2023tmlr-lofi/)

BibTeX

@article{wortsman2023tmlr-lofi,
  title     = {{Lo-Fi: Distributed Fine-Tuning Without Communication}},
  author    = {Wortsman, Mitchell and Gururangan, Suchin and Li, Shen and Farhadi, Ali and Schmidt, Ludwig and Rabbat, Michael and Morcos, Ari S.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/wortsman2023tmlr-lofi/}
}