Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Abstract
Quadratic approximations form a fundamental building block of machine learning methods. E.g., second-order optimizers try to find the Newton step into the minimum of a local quadratic proxy to the objective function; and the second-order approximation of a network's loss function can be used to quantify the uncertainty of its outputs via the Laplace approximation. When computations on the entire training set are intractable - typical for deep learning - the relevant quantities are computed on mini-batches. This, however, distorts and biases the shape of the associated *stochastic* quadratic approximations in an intricate way with detrimental effects on applications. In this paper, we (i) show that this bias introduces a systematic error, (ii) provide a theoretical explanation for it, (iii) explain its relevance for second-order optimization and uncertainty quantification via the Laplace approximation in deep learning, and (iv) develop and evaluate debiasing strategies.
Cite
Text
Tatzel et al. "Debiasing Mini-Batch Quadratics for Applications in Deep Learning." International Conference on Learning Representations, 2025.Markdown
[Tatzel et al. "Debiasing Mini-Batch Quadratics for Applications in Deep Learning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/tatzel2025iclr-debiasing/)BibTeX
@inproceedings{tatzel2025iclr-debiasing,
title = {{Debiasing Mini-Batch Quadratics for Applications in Deep Learning}},
author = {Tatzel, Lukas and Mucsányi, Bálint and Hackel, Osane and Hennig, Philipp},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/tatzel2025iclr-debiasing/}
}