Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization

Abstract

We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample re-weighting. Leveraging insights from distributionally robust optimization (DRO) with Kullback-Leibler divergence, our method dynamically assigns importance weights to training data during each optimization step. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the effectiveness of RGD on various learning tasks, including supervised learning, meta-learning, and out-of-domain generalization. Notably, RGD achieves state-of-the-art results on diverse benchmarks, with improvements of +0.7% on DomainBed, +1.44% on tabular classification, +1.94% on GLUE with BERT, and +1.01% on ImageNet-1K with ViT.

Cite

Text

Kumar et al. "Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization." Transactions on Machine Learning Research, 2024.

Markdown

[Kumar et al. "Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/kumar2024tmlr-stochastic/)

BibTeX

@article{kumar2024tmlr-stochastic,
  title     = {{Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization}},
  author    = {Kumar, Ramnath and Majmundar, Kushal Alpesh and Nagaraj, Dheeraj Mysore and Suggala, Arun},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/kumar2024tmlr-stochastic/}
}