Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling
Abstract
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. Warmup is one of nontrivial techniques to stabilize the convergence of large batch training. However, warmup is an empirical method and it is still unknown whether there is a better algorithm with theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We prove the convergence of our algorithm by introducing a new fine-grained analysis of gradient-based methods. Furthermore, the new analysis also helps to understand two other empirical tricks, layer-wise adaptive rate scaling and linear learning rate scaling. We conduct extensive experiments and demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.
Cite
Text
Huo et al. "Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I9.16962Markdown
[Huo et al. "Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/huo2021aaai-large/) doi:10.1609/AAAI.V35I9.16962BibTeX
@inproceedings{huo2021aaai-large,
title = {{Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling}},
author = {Huo, Zhouyuan and Gu, Bin and Huang, Heng},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {7883-7890},
doi = {10.1609/AAAI.V35I9.16962},
url = {https://mlanthology.org/aaai/2021/huo2021aaai-large/}
}