Modular Block-Diagonal Curvature Approximations for Feedforward Architectures
Abstract
We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.
Cite
Text
Dangel et al. "Modular Block-Diagonal Curvature Approximations for Feedforward Architectures." Artificial Intelligence and Statistics, 2020.Markdown
[Dangel et al. "Modular Block-Diagonal Curvature Approximations for Feedforward Architectures." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/dangel2020aistats-modular/)BibTeX
@inproceedings{dangel2020aistats-modular,
title = {{Modular Block-Diagonal Curvature Approximations for Feedforward Architectures}},
author = {Dangel, Felix and Harmeling, Stefan and Hennig, Philipp},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {799-808},
volume = {108},
url = {https://mlanthology.org/aistats/2020/dangel2020aistats-modular/}
}