$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Abstract

Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (*meta-generalize*), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization ($\mu$P) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for $\mu$-parameterized LOs ($\mu$LOs). Our empirical evaluation demonstrates that LOs meta-trained with our recipe substantially improve meta-generalization to wider unseen tasks when compared to LOs trained under standard parametrization (SP) using the same compute budget. We also empirically observe that $\mu$LOs exhibit unexpectedly improved meta-generalization to deeper networks ($5\times$ meta-training) and surprising generalization to much longer training horizons ($25\times$ meta-training) when compared to SP LOs.

Cite

Text

Thérien et al. "$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers." International Conference on Learning Representations, 2026.

Markdown

[Thérien et al. "$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/therien2026iclr-lo/)

BibTeX

@inproceedings{therien2026iclr-lo,
  title     = {{$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers}},
  author    = {Thérien, Benjamin and Joseph, Charles-Étienne and Knyazev, Boris and Oyallon, Edouard and Rish, Irina and Belilovsky, Eugene},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/therien2026iclr-lo/}
}