ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Abstract

Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naïvely can lead to high per-iteration costs associated with forming the Hessian and performing the associated linear system solve. To address this, we introduce ADAHESSIAN, a new stochastic optimization algorithm. ADAHESSIAN directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. We perform extensive tests on NLP, CV, and recommendation system tasks, and ADAHESSIAN achieves state-of-the-art results. In particular, we find that ADAHESSIAN: (i) outperforms AdamW for transformers by0.13/0.33 BLEU score on IWSLT14/WMT14, 2.7/1.0 PPLon PTB/Wikitext-103; (ii) outperforms AdamW for Squeeze-Bert by 0.41 points on GLUE; (iii) achieves 1.45%/5.55%higher accuracy on ResNet32/ResNet18 on Cifar10/ImageNetas compared to Adam; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. The cost per iteration of ADAHESSIANis comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values. The code for ADAHESSIAN is open-sourced and publicly-available [1].

Cite

Text

Yao et al. "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I12.17275

Markdown

[Yao et al. "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/yao2021aaai-adahessian/) doi:10.1609/AAAI.V35I12.17275

BibTeX

@inproceedings{yao2021aaai-adahessian,
  title     = {{ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning}},
  author    = {Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {10665-10673},
  doi       = {10.1609/AAAI.V35I12.17275},
  url       = {https://mlanthology.org/aaai/2021/yao2021aaai-adahessian/}
}