Variational Learning Is Effective for Large Deep Networks

Abstract

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON’s computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Code is available at https://github.com/team-approx-bayes/ivon.

Cite

Text

Shen et al. "Variational Learning Is Effective for Large Deep Networks." International Conference on Machine Learning, 2024.

Markdown

[Shen et al. "Variational Learning Is Effective for Large Deep Networks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/shen2024icml-variational/)

BibTeX

@inproceedings{shen2024icml-variational,
  title     = {{Variational Learning Is Effective for Large Deep Networks}},
  author    = {Shen, Yuesong and Daheim, Nico and Cong, Bai and Nickl, Peter and Marconi, Gian Maria and Raoul, Bazan Clement Emile Marcel and Yokota, Rio and Gurevych, Iryna and Cremers, Daniel and Khan, Mohammad Emtiyaz and Möllenhoff, Thomas},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {44665-44686},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/shen2024icml-variational/}
}