Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity

Abstract

In this paper, we establish tight lower bounds for Byzantine-robust distributed first-order stochastic methods in both strongly convex and non-convex stochastic optimization. We reveal that when the distributed nodes have heterogeneous data, the convergence error comprises two components: a non-vanishing Byzantine error and a vanishing optimization error. We establish the lower bounds on the Byzantine error and on the minimum number of queries to a stochastic gradient oracle for achieving an arbitrarily small optimization error. Nevertheless, we also identify significant discrepancies between our established lower bounds and the existing upper bounds. To fill this gap, we leverage the techniques of Nesterov's acceleration and variance reduction to develop novel Byzantine-robust distributed stochastic optimization methods that provably match these lower bounds, up to at most logarithmic factors, implying that our established lower bounds are tight.

Cite

Text

Shi et al. "Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity." Journal of Machine Learning Research, 2025.

Markdown

[Shi et al. "Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity." Journal of Machine Learning Research, 2025.](https://mlanthology.org/jmlr/2025/shi2025jmlr-optimal/)

BibTeX

@article{shi2025jmlr-optimal,
  title     = {{Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity}},
  author    = {Shi, Qiankun and Peng, Jie and Yuan, Kun and Wang, Xiao and Ling, Qing},
  journal   = {Journal of Machine Learning Research},
  year      = {2025},
  pages     = {1-58},
  volume    = {26},
  url       = {https://mlanthology.org/jmlr/2025/shi2025jmlr-optimal/}
}