Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

Abstract

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Petersen et al. "Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms." Neural Information Processing Systems, 2024. doi:10.52202/079017-0821

Markdown

[Petersen et al. "Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/petersen2024neurips-newton/) doi:10.52202/079017-0821

BibTeX

@inproceedings{petersen2024neurips-newton,
  title     = {{Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms}},
  author    = {Petersen, Felix and Borgelt, Christian and Sutter, Tobias and Kuehne, Hilde and Deussen, Oliver and Ermon, Stefano},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0821},
  url       = {https://mlanthology.org/neurips/2024/petersen2024neurips-newton/}
}