Transition to Linearity of Wide Neural Networks Is an Emerging Property of Assembling Weak Models
Abstract
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the neural networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse ``weak'' sub-models, none of which dominate the assembly.
Cite
Text
Liu et al. "Transition to Linearity of Wide Neural Networks Is an Emerging Property of Assembling Weak Models." International Conference on Learning Representations, 2022.Markdown
[Liu et al. "Transition to Linearity of Wide Neural Networks Is an Emerging Property of Assembling Weak Models." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/liu2022iclr-transition/)BibTeX
@inproceedings{liu2022iclr-transition,
title = {{Transition to Linearity of Wide Neural Networks Is an Emerging Property of Assembling Weak Models}},
author = {Liu, Chaoyue and Zhu, Libin and Belkin, Misha},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/liu2022iclr-transition/}
}