Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent
Abstract
In recent years, there have been a growing number of works studying the generalization properties of stochastic gradient descent (SGD) from the perspective of algorithmic stability. However, few of them devote to simultaneously studying the generalization and optimization for the non-convex setting, especially pairwise SGD with heavy-tailed gradient noise. This paper considers the impact of the heavy-tailed gradient noise obeying sub-Weibull distribution on the stability-based learning guarantees for non-convex pairwise SGD by investigating its generalization and optimization jointly. Specifically, based on two novel pairwise uniform model stability tools, we firstly bound the generalization error of pairwise SGD in the general non-convex setting after bridging the quantitative relationships between stability and generalization error. Then, we further consider the practical heavy-tailed sub-Weibull gradient noise condition to establish a refined generalization bound without the bounded gradient condition. Finally, sharper error bounds for generalization and optimization are built by introducing the gradient dominance condition. Comparing these results reveals that sub-Weibull gradient noise brings some positive dependencies on the heavy-tailed strength for generalization and optimization. Furthermore, we extend our analysis to the corresponding pairwise minibatch SGD and derive the first stability-based near-optimal generalization and optimization bounds which are consistent with many empirical observations.
Cite
Text
Chen et al. "Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I15.33735Markdown
[Chen et al. "Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/chen2025aaai-error/) doi:10.1609/AAAI.V39I15.33735BibTeX
@inproceedings{chen2025aaai-error,
title = {{Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent}},
author = {Chen, Jun and Chen, Hong and Gu, Bin and Liu, Guodong and Wang, Yingjie and Li, Weifu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {15803-15811},
doi = {10.1609/AAAI.V39I15.33735},
url = {https://mlanthology.org/aaai/2025/chen2025aaai-error/}
}