Distributionally Robust Optimization with Bias and Variance Reduction
Abstract
We consider the distributionally robust optimization (DRO) problem, wherein a learner optimizes the worst-case empirical risk achievable by reweighing the observed training examples. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3x faster than baselines such as SGD and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.
Cite
Text
Mehta et al. "Distributionally Robust Optimization with Bias and Variance Reduction." International Conference on Learning Representations, 2024.Markdown
[Mehta et al. "Distributionally Robust Optimization with Bias and Variance Reduction." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/mehta2024iclr-distributionally/)BibTeX
@inproceedings{mehta2024iclr-distributionally,
title = {{Distributionally Robust Optimization with Bias and Variance Reduction}},
author = {Mehta, Ronak and Roulet, Vincent and Pillutla, Krishna and Harchaoui, Zaid},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/mehta2024iclr-distributionally/}
}