Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

NeurIPS 2021

/neurips/2021/yang2021neurips-tuning/

Abstract

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call *$\mu$Transfer*: parametrize the target model in $\mu$P, tune the HP indirectly on a smaller model, and *zero-shot transfer* them to the full-sized model, i.e., without directly tuning the latter at all.We verify $\mu$Transfer on Transformer and ResNet. For example, 1) by transferring pretraining HPs from a model of 13M parameters, we outperform published numbers of BERT-large (350M parameters), with a total tuning cost equivalent to pretraining BERT-large once; 2) by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model, with tuning cost only 7% of total pretraining cost. A Pytorch implementation of our technique can be found at github.com/microsoft/mup. See arxiv.org for the full, up-to-date version of this work.

PDF NeurIPS OpenReview Code Semantic Scholar

Cite

Text

Yang et al. "Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer." Neural Information Processing Systems, 2021.

Markdown

[Yang et al. "Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-tuning/)

BibTeX

@inproceedings{yang2021neurips-tuning,
  title     = {{Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer}},
  author    = {Yang, Ge and Hu, Edward and Babuschkin, Igor and Sidor, Szymon and Liu, Xiaodong and Farhi, David and Ryder, Nick and Pachocki, Jakub and Chen, Weizhu and Gao, Jianfeng},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/yang2021neurips-tuning/}
}