On the Concurrence of Layer-Wise Preconditioning Methods and Provable Feature Learning

Abstract

Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms that introduce preconditioners per axis of each layer’s weight tensors. These methods have seen a recent resurgence, demonstrating impressive performance relative to entry-wise ("diagonal") preconditioning methods such as Adam(W) on a wide range of neural network optimization tasks. Complementary to their practical performance, we demonstrate that layer-wise preconditioning methods are provably necessary from a statistical perspective. To showcase this, we consider two prototypical models, linear representation learning and single-index learning, which are widely used to study how typical algorithms efficiently learn useful features to enable generalization. In these problems, we show SGD is a suboptimal feature learner when extending beyond ideal isotropic inputs $\mathbf{x} \sim \mathsf{N}(\mathbf{0}, \mathbf{I})$ and well-conditioned settings typically assumed in prior work. We demonstrate theoretically and numerically that this suboptimality is fundamental, and that layer-wise preconditioning emerges naturally as the solution. We further show that standard tools like Adam preconditioning and batch-norm only mildly mitigate these issues, supporting the unique benefits of layer-wise preconditioning.

Cite

Text

Zhang et al. "On the Concurrence of Layer-Wise Preconditioning Methods and Provable Feature Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "On the Concurrence of Layer-Wise Preconditioning Methods and Provable Feature Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-concurrence/)

BibTeX

@inproceedings{zhang2025icml-concurrence,
  title     = {{On the Concurrence of Layer-Wise Preconditioning Methods and Provable Feature Learning}},
  author    = {Zhang, Thomas Tck and Moniri, Behrad and Nagwekar, Ansh and Rahman, Faraz and Xue, Anton and Hassani, Hamed and Matni, Nikolai},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {75793-75833},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-concurrence/}
}