Generative Feature Training of Thin 2-Layer Networks

Abstract

We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.

Cite

Text

Hertrich and Neumayer. "Generative Feature Training of Thin 2-Layer Networks." Transactions on Machine Learning Research, 2025.

Markdown

[Hertrich and Neumayer. "Generative Feature Training of Thin 2-Layer Networks." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/hertrich2025tmlr-generative/)

BibTeX

@article{hertrich2025tmlr-generative,
  title     = {{Generative Feature Training of Thin 2-Layer Networks}},
  author    = {Hertrich, Johannes and Neumayer, Sebastian},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/hertrich2025tmlr-generative/}
}