On Optimization Methods for Deep Learning

Le, Quoc V.; Ngiam, Jiquan; Coates, Adam; Lahiri, Ahbik; Prochnow, Bobby; Ng, Andrew Y.

On Optimization Methods for Deep Learning

Quoc V. Le, Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, Andrew Y. Ng

ICML 2011 pp. 265-272

/icml/2011/le2011icml-optimization/

Abstract

The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware extensions (e.g., GPUs or computer clusters). Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks. Using L-BFGS, our convolutional network model achieves 0.69\% on the standard MNIST dataset. This is a state-of-the-art result on MNIST among algorithms that do not use distortions or pretraining.

PDF Semantic Scholar

Cite

Text

Le et al. "On Optimization Methods for Deep Learning." International Conference on Machine Learning, 2011.

Markdown

[Le et al. "On Optimization Methods for Deep Learning." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/le2011icml-optimization/)

BibTeX

@inproceedings{le2011icml-optimization,
  title     = {{On Optimization Methods for Deep Learning}},
  author    = {Le, Quoc V. and Ngiam, Jiquan and Coates, Adam and Lahiri, Ahbik and Prochnow, Bobby and Ng, Andrew Y.},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {265-272},
  url       = {https://mlanthology.org/icml/2011/le2011icml-optimization/}
}