Gradient-Based Optimization of Hyperparameters
Abstract
Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper-parameter gradient involving second derivatives of the training criterion.
Cite
Text
Bengio. "Gradient-Based Optimization of Hyperparameters." Neural Computation, 2000. doi:10.1162/089976600300015187Markdown
[Bengio. "Gradient-Based Optimization of Hyperparameters." Neural Computation, 2000.](https://mlanthology.org/neco/2000/bengio2000neco-gradientbased/) doi:10.1162/089976600300015187BibTeX
@article{bengio2000neco-gradientbased,
title = {{Gradient-Based Optimization of Hyperparameters}},
author = {Bengio, Yoshua},
journal = {Neural Computation},
year = {2000},
pages = {1889-1900},
doi = {10.1162/089976600300015187},
volume = {12},
url = {https://mlanthology.org/neco/2000/bengio2000neco-gradientbased/}
}