Gradient-Based Optimization of Hyperparameters

Bengio, Yoshua

doi:10.1162/089976600300015187

Gradient-Based Optimization of Hyperparameters

Yoshua Bengio

NeCo 2000 pp. 1889-1900

doi:10.1162/089976600300015187 /neco/2000/bengio2000neco-gradientbased/

Abstract

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper-parameter gradient involving second derivatives of the training criterion.

NeCo Semantic Scholar

Cite

Text

Bengio. "Gradient-Based Optimization of Hyperparameters." Neural Computation, 2000. doi:10.1162/089976600300015187

Markdown

[Bengio. "Gradient-Based Optimization of Hyperparameters." Neural Computation, 2000.](https://mlanthology.org/neco/2000/bengio2000neco-gradientbased/) doi:10.1162/089976600300015187

BibTeX

@article{bengio2000neco-gradientbased,
  title     = {{Gradient-Based Optimization of Hyperparameters}},
  author    = {Bengio, Yoshua},
  journal   = {Neural Computation},
  year      = {2000},
  pages     = {1889-1900},
  doi       = {10.1162/089976600300015187},
  volume    = {12},
  url       = {https://mlanthology.org/neco/2000/bengio2000neco-gradientbased/}
}