Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization
Abstract
Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective’s Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective’s curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD.
Cite
Text
Dahlinger et al. "Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization." NeurIPS 2023 Workshops: OPT, 2023.Markdown
[Dahlinger et al. "Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/dahlinger2023neuripsw-informationtheoretic/)BibTeX
@inproceedings{dahlinger2023neuripsw-informationtheoretic,
title = {{Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization}},
author = {Dahlinger, Philipp and Becker, Philipp and Hüttenrauch, Maximilian and Neumann, Gerhard},
booktitle = {NeurIPS 2023 Workshops: OPT},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/dahlinger2023neuripsw-informationtheoretic/}
}