Norm Matters: Efficient and Accurate Normalization Schemes in Deep Networks

Abstract

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective. This property highlights the connection between practices such as normalization, weight decay and learning-rate adjustments. We suggest several alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$ and $L^\infty$ spaces that can substantially improve numerical stability in low-precision implementations as well as provide computational and memory benefits. We demonstrate that such methods enable the first batch-norm alternative to work for half-precision implementations. Finally, we suggest a modification to weight-normalization, which improves its performance on large-scale tasks.

Cite

Text

Hoffer et al. "Norm Matters: Efficient and Accurate Normalization Schemes in Deep Networks." Neural Information Processing Systems, 2018.

Markdown

[Hoffer et al. "Norm Matters: Efficient and Accurate Normalization Schemes in Deep Networks." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/hoffer2018neurips-norm/)

BibTeX

@inproceedings{hoffer2018neurips-norm,
  title     = {{Norm Matters: Efficient and Accurate Normalization Schemes in Deep Networks}},
  author    = {Hoffer, Elad and Banner, Ron and Golan, Itay and Soudry, Daniel},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {2160-2170},
  url       = {https://mlanthology.org/neurips/2018/hoffer2018neurips-norm/}
}