Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Abstract

Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.

Cite

Text

Lin et al. "Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC." International Conference on Machine Learning, 2024.

Markdown

[Lin et al. "Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/lin2024icml-structured/)

BibTeX

@inproceedings{lin2024icml-structured,
  title     = {{Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC}},
  author    = {Lin, Wu and Dangel, Felix and Eschenhagen, Runa and Neklyudov, Kirill and Kristiadi, Agustinus and Turner, Richard E. and Makhzani, Alireza},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {29974-29991},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/lin2024icml-structured/}
}