Revisiting Weight Initialization of Deep Neural Networks

Abstract

The proper {\em initialization of weights} is crucial for the effective training and fast convergence of {\em deep neural networks} (DNNs). Prior work in this area has mostly focused on the principle of {\em balancing the variance among weights per layer} to maintain stability of (i) the input data propagated forwards through the network, and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only first-order effects per layer. In this paper, we investigate a {\em unifying approach}, based on approximating and controlling the {\em norm of the layers’ Hessians}, which both generalizes and explains existing initialization schemes such as {\em smooth activation functions}, {\em Dropouts}, and {\em ReLU}.

Cite

Text

Skorski et al. "Revisiting Weight Initialization of Deep Neural Networks." Proceedings of The 13th Asian Conference on Machine Learning, 2021.

Markdown

[Skorski et al. "Revisiting Weight Initialization of Deep Neural Networks." Proceedings of The 13th Asian Conference on Machine Learning, 2021.](https://mlanthology.org/acml/2021/skorski2021acml-revisiting/)

BibTeX

@inproceedings{skorski2021acml-revisiting,
  title     = {{Revisiting Weight Initialization of Deep Neural Networks}},
  author    = {Skorski, Maciej and Temperoni, Alessandro and Theobald, Martin},
  booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
  year      = {2021},
  pages     = {1192-1207},
  volume    = {157},
  url       = {https://mlanthology.org/acml/2021/skorski2021acml-revisiting/}
}