Critical Feature Learning in Deep Neural Networks
Abstract
A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width $N$. A large deviation approach, which is exact in the proportional limit for the number of data points $P=\alpha N\to\infty$, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively, to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.
Cite
Text
Fischer et al. "Critical Feature Learning in Deep Neural Networks." International Conference on Machine Learning, 2024.Markdown
[Fischer et al. "Critical Feature Learning in Deep Neural Networks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/fischer2024icml-critical/)BibTeX
@inproceedings{fischer2024icml-critical,
title = {{Critical Feature Learning in Deep Neural Networks}},
author = {Fischer, Kirsten and Lindner, Javed and Dahmen, David and Ringel, Zohar and Krämer, Michael and Helias, Moritz},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {13660-13690},
volume = {235},
url = {https://mlanthology.org/icml/2024/fischer2024icml-critical/}
}