Debiasing Mini-Batch Quadratics for Applications in Deep Learning

Abstract

Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated *stochastic* quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.

Cite

Text

Tatzel et al. "Debiasing Mini-Batch Quadratics for Applications in Deep Learning." International Conference on Learning Representations, 2025.

Markdown

[Tatzel et al. "Debiasing Mini-Batch Quadratics for Applications in Deep Learning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/tatzel2025iclr-debiasing/)

BibTeX

@inproceedings{tatzel2025iclr-debiasing,
  title     = {{Debiasing Mini-Batch Quadratics for Applications in Deep Learning}},
  author    = {Tatzel, Lukas and Mucsányi, Bálint and Hackel, Osane and Hennig, Philipp},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/tatzel2025iclr-debiasing/}
}