Trade-Offs of Diagonal Fisher Information Matrix Estimators

Abstract

The Fisher information matrix can be used to characterize the local geometry ofthe parameter space of neural networks. It elucidates insightful theories anduseful tools to understand and optimize neural networks. Given its highcomputational cost, practitioners often use random estimators and evaluate onlythe diagonal entries. We examine two popular estimators whose accuracy and samplecomplexity depend on their associated variances. We derive bounds of thevariances and instantiate them in neural networks for regression andclassification. We navigate trade-offs for both estimators based on analyticaland numerical studies. We find that the variance quantities depend on thenon-linearity w.r.t. different parameter groups and should not be neglected whenestimating the Fisher information.

Cite

Text

Soen and Sun. "Trade-Offs of Diagonal Fisher Information Matrix Estimators." Neural Information Processing Systems, 2024. doi:10.52202/079017-0191

Markdown

[Soen and Sun. "Trade-Offs of Diagonal Fisher Information Matrix Estimators." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/soen2024neurips-tradeoffs/) doi:10.52202/079017-0191

BibTeX

@inproceedings{soen2024neurips-tradeoffs,
  title     = {{Trade-Offs of Diagonal Fisher Information Matrix Estimators}},
  author    = {Soen, Alexander and Sun, Ke},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0191},
  url       = {https://mlanthology.org/neurips/2024/soen2024neurips-tradeoffs/}
}