Two Losses Are Better than One: Faster Optimization Using a Cheaper Proxy

Abstract

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal-point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.

Cite

Text

Woodworth et al. "Two Losses Are Better than One: Faster Optimization Using a Cheaper Proxy." International Conference on Machine Learning, 2023.

Markdown

[Woodworth et al. "Two Losses Are Better than One: Faster Optimization Using a Cheaper Proxy." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/woodworth2023icml-two/)

BibTeX

@inproceedings{woodworth2023icml-two,
  title     = {{Two Losses Are Better than One: Faster Optimization Using a Cheaper Proxy}},
  author    = {Woodworth, Blake and Mishchenko, Konstantin and Bach, Francis},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {37273-37292},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/woodworth2023icml-two/}
}