Gradient Dynamics of Single-Neuron Autoencoders on Orthogonal Data
Abstract
In this work we investigate the dynamics of (stochastic) gradient descent when training a single-neuron ReLU autoencoder on orthogonal inputs. We show that for this non-convex problem there exists a manifold of global minima all with the same maximum Hessian eigenvalue and that gradient descent reaches a particular global minimum when initialized randomly. Interestingly, which minimum is reached depends heavily on the batch-size. For full batch gradient descent, the directions of the neuron that are initially positively correlated with the data are merely rescaled uniformly, hence in high-dimensions the learned neuron is a near uniform mixture of these directions. On the other hand, with batch-size one the neuron exactly aligns with a single such direction, showing that when using a small batch-size a qualitatively different type of ``feature selection" occurs.
Cite
Text
Ghosh et al. "Gradient Dynamics of Single-Neuron Autoencoders on Orthogonal Data." NeurIPS 2022 Workshops: OPT, 2022.Markdown
[Ghosh et al. "Gradient Dynamics of Single-Neuron Autoencoders on Orthogonal Data." NeurIPS 2022 Workshops: OPT, 2022.](https://mlanthology.org/neuripsw/2022/ghosh2022neuripsw-gradient/)BibTeX
@inproceedings{ghosh2022neuripsw-gradient,
title = {{Gradient Dynamics of Single-Neuron Autoencoders on Orthogonal Data}},
author = {Ghosh, Nikhil and Frei, Spencer and Ha, Wooseok and Yu, Bin},
booktitle = {NeurIPS 2022 Workshops: OPT},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/ghosh2022neuripsw-gradient/}
}