Rethinking Exponential Averaging of the Fisher

Puiu, Constantin Octavian

doi:10.1007/978-3-031-26419-1_20

Rethinking Exponential Averaging of the Fisher

Constantin Octavian Puiu

ECML-PKDD 2022 pp. 327-343

doi:10.1007/978-3-031-26419-1_20 /ecmlpkdd/2022/puiu2022ecmlpkdd-rethinking/

Abstract

In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving ea-cm algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between ea-cm algorithms and what we call a “Wake of Quadratic models” . The outlined connection allows us to understand what ea-cm algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, KL-Divergence Wake-Regularized Models ( kld-wrm ). We give three different practical instantiations of kld-wrm , and show numerically that these outperform k-fac on MNIST.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Puiu. "Rethinking Exponential Averaging of the Fisher." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26419-1_20

Markdown

[Puiu. "Rethinking Exponential Averaging of the Fisher." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/puiu2022ecmlpkdd-rethinking/) doi:10.1007/978-3-031-26419-1_20

BibTeX

@inproceedings{puiu2022ecmlpkdd-rethinking,
  title     = {{Rethinking Exponential Averaging of the Fisher}},
  author    = {Puiu, Constantin Octavian},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2022},
  pages     = {327-343},
  doi       = {10.1007/978-3-031-26419-1_20},
  url       = {https://mlanthology.org/ecmlpkdd/2022/puiu2022ecmlpkdd-rethinking/}
}