On the Distance Between Two Neural Networks and the Stability of Learning
Abstract
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage.
Cite
Text
Bernstein et al. "On the Distance Between Two Neural Networks and the Stability of Learning." Neural Information Processing Systems, 2020.Markdown
[Bernstein et al. "On the Distance Between Two Neural Networks and the Stability of Learning." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/bernstein2020neurips-distance/)BibTeX
@inproceedings{bernstein2020neurips-distance,
title = {{On the Distance Between Two Neural Networks and the Stability of Learning}},
author = {Bernstein, Jeremy and Vahdat, Arash and Yue, Yisong and Liu, Ming-Yu},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/bernstein2020neurips-distance/}
}