Big Neural Networks Waste Capacity
Abstract
This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that the optimization method - first order gradient descent - fails at this regime. Directly attacking this problem, either through the optimization method or the choices of parametrization, may allow to improve the generalization error on large datasets, for which a large capacity is required.
Cite
Text
Dauphin and Bengio. "Big Neural Networks Waste Capacity." International Conference on Learning Representations, 2013.Markdown
[Dauphin and Bengio. "Big Neural Networks Waste Capacity." International Conference on Learning Representations, 2013.](https://mlanthology.org/iclr/2013/dauphin2013iclr-big/)BibTeX
@inproceedings{dauphin2013iclr-big,
title = {{Big Neural Networks Waste Capacity}},
author = {Dauphin, Yann N. and Bengio, Yoshua},
booktitle = {International Conference on Learning Representations},
year = {2013},
url = {https://mlanthology.org/iclr/2013/dauphin2013iclr-big/}
}