Online Learning and Information Exponents: The Importance of Batch Size & Time/Complexity Tradeoffs

Abstract

We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches $n_b \lesssim d^{\frac{\ell}{2}}$ minimizes the training time without changing the total sample complexity, where $\ell$ is the information exponent of the target to be learned and $d$ is the input dimension. However, larger batch sizes than $n_b \gg d^{\frac{\ell}{2}}$ are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, Correlation loss SGD, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.

Cite

Text

Arnaboldi et al. "Online Learning and Information Exponents: The Importance of Batch Size & Time/Complexity Tradeoffs." International Conference on Machine Learning, 2024.

Markdown

[Arnaboldi et al. "Online Learning and Information Exponents: The Importance of Batch Size & Time/Complexity Tradeoffs." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/arnaboldi2024icml-online/)

BibTeX

@inproceedings{arnaboldi2024icml-online,
  title     = {{Online Learning and Information Exponents: The Importance of Batch Size & Time/Complexity Tradeoffs}},
  author    = {Arnaboldi, Luca and Dandi, Yatin and Krzakala, Florent and Loureiro, Bruno and Pesce, Luca and Stephan, Ludovic},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {1730-1762},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/arnaboldi2024icml-online/}
}