On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent
Abstract
Stochastic gradient descent (SGD) is an estimation tool for large data employed in machine learning and statistics. Due to the Markovian nature of the SGD process, inference is a challenging problem. An underlying asymptotic normality of the averaged SGD (ASGD) estimator allows for the construction of a batch-means estimator of the asymptotic covariance matrix. Instead of the usual increasing batch-size strategy, we propose a memory efficient equal batch-size strategy and show that under mild conditions, the batch-means estimator is consistent. A key feature of the proposed batching technique is that it allows for bias-correction of the variance, at no additional cost to memory. Further, since joint inference for large dimensional problems may be undesirable, we present marginal-friendly simultaneous confidence intervals, and show through an example on how covariance estimators of ASGD can be employed for improved predictions.
Cite
Text
Singh et al. "On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent." Journal of Machine Learning Research, 2025.Markdown
[Singh et al. "On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent." Journal of Machine Learning Research, 2025.](https://mlanthology.org/jmlr/2025/singh2025jmlr-utility/)BibTeX
@article{singh2025jmlr-utility,
title = {{On the Utility of Equal Batch Sizes for Inference in Stochastic Gradient Descent}},
author = {Singh, Rahul and Shukla, Abhinek and Vats, Dootika},
journal = {Journal of Machine Learning Research},
year = {2025},
pages = {1-41},
volume = {26},
url = {https://mlanthology.org/jmlr/2025/singh2025jmlr-utility/}
}