$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Abstract
Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks much larger than those seen during meta-training. To address this, we derive the Maximal Update Parametrization ($\mu$P) for two popular learned optimizer architectures and propose a simple meta-training recipe for $\mu$-parameterized LOs ($\mu$LOs). Our empirical evaluation demonstrates that LOs meta-trained with our recipe substantially improve meta-generalization to wider unseen tasks when compared to LOs trained under standard parametrization (e.g., as they are trained in existing work). When applying our $\mu$LOs, each trained for less than 250 GPU-hours, to large-width models we are often able to match or exceed the performance of pre-trained VeLO, the most performant publicly available learned optimizer, meta-trained with 4000 TPU-months of compute. We also empirically observe that learned optimizers trained with our $\mu$LO recipe also exhibit substantially improved meta-generalization to deeper networks ($5\times$ meta-training) and remarkable generalization to much longer training horizons ($25\times$ meta-training).
Cite
Text
Thérien et al. "$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers." NeurIPS 2024 Workshops: OPT, 2024.Markdown
[Thérien et al. "$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers." NeurIPS 2024 Workshops: OPT, 2024.](https://mlanthology.org/neuripsw/2024/therien2024neuripsw-lo/)BibTeX
@inproceedings{therien2024neuripsw-lo,
title = {{$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers}},
author = {Thérien, Benjamin and Joseph, Charles-Étienne and Knyazev, Boris and Oyallon, Edouard and Rish, Irina and Belilovsky, Eugene},
booktitle = {NeurIPS 2024 Workshops: OPT},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/therien2024neuripsw-lo/}
}