Rethinking Exponential Averaging of the Fisher
Abstract
In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving ea-cm algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between ea-cm algorithms and what we call a “Wake of Quadratic models” . The outlined connection allows us to understand what ea-cm algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, KL-Divergence Wake-Regularized Models ( kld-wrm ). We give three different practical instantiations of kld-wrm , and show numerically that these outperform k-fac on MNIST.
Cite
Text
Puiu. "Rethinking Exponential Averaging of the Fisher." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26419-1_20Markdown
[Puiu. "Rethinking Exponential Averaging of the Fisher." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/puiu2022ecmlpkdd-rethinking/) doi:10.1007/978-3-031-26419-1_20BibTeX
@inproceedings{puiu2022ecmlpkdd-rethinking,
title = {{Rethinking Exponential Averaging of the Fisher}},
author = {Puiu, Constantin Octavian},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {327-343},
doi = {10.1007/978-3-031-26419-1_20},
url = {https://mlanthology.org/ecmlpkdd/2022/puiu2022ecmlpkdd-rethinking/}
}