Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC
Abstract
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.
Cite
Text
Lin et al. "Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC." International Conference on Machine Learning, 2024.Markdown
[Lin et al. "Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/lin2024icml-structured/)BibTeX
@inproceedings{lin2024icml-structured,
title = {{Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC}},
author = {Lin, Wu and Dangel, Felix and Eschenhagen, Runa and Neklyudov, Kirill and Kristiadi, Agustinus and Turner, Richard E. and Makhzani, Alireza},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {29974-29991},
volume = {235},
url = {https://mlanthology.org/icml/2024/lin2024icml-structured/}
}