Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks

Abstract

Abstract In this work, we describe a set of rules for the design and initialization of well-conditioned neural networks, guided by the goal of naturally balancing the diagonal blocks of the Hessian at the start of training. We show how our measure of conditioning of a block relates to another natural measure of conditioning, the ratio of weight gradients to the weights. We prove that for a ReLU-based deep multilayer perceptron, a simple initialization scheme using the geometric mean of the fan-in and fan-out satisfies our scaling rule. For more sophisticated architectures, we show how our scaling principle can be used to guide design choices to produce well-conditioned neural networks, reducing guess-work.

Cite

Text

Defazio and Bottou. "Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks." International Conference on Learning Representations, 2020.

Markdown

[Defazio and Bottou. "Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/defazio2020iclr-scaling/)

BibTeX

@inproceedings{defazio2020iclr-scaling,
  title     = {{Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks}},
  author    = {Defazio, Aaron and Bottou, Leon},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/defazio2020iclr-scaling/}
}