Second-Order Forward-Mode Automatic Differentiation for Optimization

Abstract

This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a $K$-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian--Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their second-order quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information for $K$-dimensional hyperplane search (FoMoH-$K$D). We derive the convergence properties of FoMoH-$K$D and show how it generalizes to Newton's method for $K=D$. We also compare its convergence rate to forward gradient descent (FGD) and show FoMoH-$K$D has an exponential convergence rate compared to FGD's sub-linear convergence for positive definite quadratic functions. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation.

Cite

Text

Cobb et al. "Second-Order Forward-Mode Automatic Differentiation for Optimization." NeurIPS 2024 Workshops: OPT, 2024.

Markdown

[Cobb et al. "Second-Order Forward-Mode Automatic Differentiation for Optimization." NeurIPS 2024 Workshops: OPT, 2024.](https://mlanthology.org/neuripsw/2024/cobb2024neuripsw-secondorder/)

BibTeX

@inproceedings{cobb2024neuripsw-secondorder,
  title     = {{Second-Order Forward-Mode Automatic Differentiation for Optimization}},
  author    = {Cobb, Adam D. and Baydin, Atilim Gunes and Pearlmutter, Barak A. and Jha, Susmit},
  booktitle = {NeurIPS 2024 Workshops: OPT},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/cobb2024neuripsw-secondorder/}
}