A Majorization-Minimization Algorithm for (multiple) Hyperparameter Learning
Abstract
We present a general Bayesian framework for hyperparameter tuning in $L_2$-regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting nonconvex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex $L_2$-regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the $L_2$-regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.
Cite
Text
Foo et al. "A Majorization-Minimization Algorithm for (multiple) Hyperparameter Learning." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553415Markdown
[Foo et al. "A Majorization-Minimization Algorithm for (multiple) Hyperparameter Learning." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/foo2009icml-majorization/) doi:10.1145/1553374.1553415BibTeX
@inproceedings{foo2009icml-majorization,
title = {{A Majorization-Minimization Algorithm for (multiple) Hyperparameter Learning}},
author = {Foo, Chuan-Sheng and Do, Chuong B. and Ng, Andrew Y.},
booktitle = {International Conference on Machine Learning},
year = {2009},
pages = {321-328},
doi = {10.1145/1553374.1553415},
url = {https://mlanthology.org/icml/2009/foo2009icml-majorization/}
}