Scaling-up Robust Gradient Descent Techniques

Abstract

We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we choose a candidate which does not diverge too far from the majority of cheap stochastic sub-processes run over partitioned data. This lets us retain the formal strength of RGD methods at a fraction of the cost.

Cite

Text

Holland. "Scaling-up Robust Gradient Descent Techniques." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I9.16940

Markdown

[Holland. "Scaling-up Robust Gradient Descent Techniques." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/holland2021aaai-scaling/) doi:10.1609/AAAI.V35I9.16940

BibTeX

@inproceedings{holland2021aaai-scaling,
  title     = {{Scaling-up Robust Gradient Descent Techniques}},
  author    = {Holland, Matthew J.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {7694-7701},
  doi       = {10.1609/AAAI.V35I9.16940},
  url       = {https://mlanthology.org/aaai/2021/holland2021aaai-scaling/}
}