Lion's Sign Noise Can Make Training More Stable
Abstract
Lion is a novel optimization method that has outperformed traditional optimizers like Adam across a variety of tasks. Despite its empirical success, the reasons behind Lion's superiority remain unclear. In this paper, we investigate the mechanisms contributing to Lion's enhanced performance, focusing on the structured noise introduced by the use of the sign function in gradient updates. We characterize this noise by the angle of rotation between a vector and its signum. We inject this noise as a random fixed-angle rotation into normalized updates and analyze how the performance of this method compares to that of Lion. We demonstrate that this method has stronger performance than Lion in our setting. This approach reveals a relationship between the learning rate and the noise specific to the Lion method, providing insights into its improved performance metrics. Additionally, we identify an effect we term "momentum tracing" in neural networks with normalization layers and ReLU activations, which can significantly destabilize the training process. Our analysis demonstrates that the rotation noise inherent in Lion mitigates the negative impact of "momentum tracing", leading to more stable learning. These findings offer theoretical justification for Lion's effectiveness and suggest avenues for developing more robust optimization algorithms.
Cite
Text
Elistratov et al. "Lion's Sign Noise Can Make Training More Stable." NeurIPS 2024 Workshops: OPT, 2024.Markdown
[Elistratov et al. "Lion's Sign Noise Can Make Training More Stable." NeurIPS 2024 Workshops: OPT, 2024.](https://mlanthology.org/neuripsw/2024/elistratov2024neuripsw-lion/)BibTeX
@inproceedings{elistratov2024neuripsw-lion,
title = {{Lion's Sign Noise Can Make Training More Stable}},
author = {Elistratov, Simon and Podivilov, Andrey and Iuzhakov, Timofei and Vetrov, Dmitry},
booktitle = {NeurIPS 2024 Workshops: OPT},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/elistratov2024neuripsw-lion/}
}