ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Abstract
Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naïvely can lead to high per-iteration costs associated with forming the Hessian and performing the associated linear system solve. To address this, we introduce ADAHESSIAN, a new stochastic optimization algorithm. ADAHESSIAN directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. We perform extensive tests on NLP, CV, and recommendation system tasks, and ADAHESSIAN achieves state-of-the-art results. In particular, we find that ADAHESSIAN: (i) outperforms AdamW for transformers by0.13/0.33 BLEU score on IWSLT14/WMT14, 2.7/1.0 PPLon PTB/Wikitext-103; (ii) outperforms AdamW for Squeeze-Bert by 0.41 points on GLUE; (iii) achieves 1.45%/5.55%higher accuracy on ResNet32/ResNet18 on Cifar10/ImageNetas compared to Adam; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. The cost per iteration of ADAHESSIANis comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values. The code for ADAHESSIAN is open-sourced and publicly-available [1].
Cite
Text
Yao et al. "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17275Markdown
[Yao et al. "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/yao2021aaai-adahessian/) doi:10.1609/AAAI.V35I12.17275BibTeX
@inproceedings{yao2021aaai-adahessian,
title = {{ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning}},
author = {Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {10665-10673},
doi = {10.1609/AAAI.V35I12.17275},
url = {https://mlanthology.org/aaai/2021/yao2021aaai-adahessian/}
}