A Second-Order-like Optimizer with Adaptive Gradient Scaling for Deep Learning
Abstract
In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling. It leverages second order information and rescaling while keeping the memory and compute requirements of standard DL methods as AdamW or SGD. INNAprop is evaluated on CIFAR-10, Food101, and ImageNet with ResNets, VGG, DenseNet, and ViT. We also train GPT-2 (OpenWebText) from scratch and with LoRA fine-tuning (E2E). INNAprop consistently offers close performance to AdamW, while performing significantly better in our LLM training experiments, achieving faster convergence and higher accuracy with minimal hyperparameter tuning, even at large scale. Our code is public.
Cite
Text
Bolte et al. "A Second-Order-like Optimizer with Adaptive Gradient Scaling for Deep Learning." Transactions on Machine Learning Research, 2025.Markdown
[Bolte et al. "A Second-Order-like Optimizer with Adaptive Gradient Scaling for Deep Learning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/bolte2025tmlr-secondorderlike/)BibTeX
@article{bolte2025tmlr-secondorderlike,
title = {{A Second-Order-like Optimizer with Adaptive Gradient Scaling for Deep Learning}},
author = {Bolte, Jerome and Boustany, Ryan and Pauwels, Edouard and Purica, Andrei},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/bolte2025tmlr-secondorderlike/}
}