Modular Block-Diagonal Curvature Approximations for Feedforward Architectures

Abstract

We propose a modular extension of backpropagation for the computation of block-diagonal approximations to various curvature matrices of the training objective (in particular, the Hessian, generalized Gauss-Newton, and positive-curvature Hessian). The approach reduces the otherwise tedious manual derivation of these matrices into local modules, and is easy to integrate into existing machine learning libraries. Moreover, we develop a compact notation derived from matrix differential calculus. We outline different strategies applicable to our method. They subsume recently-proposed block-diagonal approximations as special cases, and are extended to convolutional neural networks in this work.

Cite

Text

Dangel et al. "Modular Block-Diagonal Curvature Approximations for Feedforward Architectures." Artificial Intelligence and Statistics, 2020.

Markdown

[Dangel et al. "Modular Block-Diagonal Curvature Approximations for Feedforward Architectures." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/dangel2020aistats-modular/)

BibTeX

@inproceedings{dangel2020aistats-modular,
  title     = {{Modular Block-Diagonal Curvature Approximations for Feedforward Architectures}},
  author    = {Dangel, Felix and Harmeling, Stefan and Hennig, Philipp},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {799-808},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/dangel2020aistats-modular/}
}