Adam Through a Second-Order Lens
Abstract
Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). We seek to combine the benefits of both approaches into a single computationally-efficient algorithm. Noting that second-order methods often depend on stabilising heuristics (such as Levenberg-Marquardt damping), we propose AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales, achieving competitive generalisation performance vs runtime.
Cite
Text
Clarke et al. "Adam Through a Second-Order Lens." NeurIPS 2023 Workshops: OPT, 2023.Markdown
[Clarke et al. "Adam Through a Second-Order Lens." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/clarke2023neuripsw-adam/)BibTeX
@inproceedings{clarke2023neuripsw-adam,
title = {{Adam Through a Second-Order Lens}},
author = {Clarke, Ross M and Su, Baiyu and Hernández-Lobato, José Miguel},
booktitle = {NeurIPS 2023 Workshops: OPT},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/clarke2023neuripsw-adam/}
}