Sophia: A Scalable Stochastic Second-Order Optimizer for Language Model Pre-Training
Abstract
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50\% fewer steps, less total compute, and reduced wall-clock time.
Cite
Text
Liu et al. "Sophia: A Scalable Stochastic Second-Order Optimizer for Language Model Pre-Training." International Conference on Learning Representations, 2024.Markdown
[Liu et al. "Sophia: A Scalable Stochastic Second-Order Optimizer for Language Model Pre-Training." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/liu2024iclr-sophia/)BibTeX
@inproceedings{liu2024iclr-sophia,
title = {{Sophia: A Scalable Stochastic Second-Order Optimizer for Language Model Pre-Training}},
author = {Liu, Hong and Li, Zhiyuan and Hall, David Leo Wright and Liang, Percy and Ma, Tengyu},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/liu2024iclr-sophia/}
}