Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems Without First-Order Gradient

Abstract

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands $\mathcal{O}(d)$ function evaluations ($d$ is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction ($\texttt{ZPDVR}$) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, $\texttt{ZPDVR}$ relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate $\texttt{ZPDVR}$’s linear convergence and demonstrate its superior performance over other related methods.

Cite

Text

Di et al. "Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems Without First-Order Gradient." International Conference on Machine Learning, 2024.

Markdown

[Di et al. "Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems Without First-Order Gradient." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/di2024icml-double-a/)

BibTeX

@inproceedings{di2024icml-double-a,
  title     = {{Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems Without First-Order Gradient}},
  author    = {Di, Hao and Ye, Haishan and Zhang, Yueling and Chang, Xiangyu and Dai, Guang and Tsang, Ivor},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {10792-10810},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/di2024icml-double-a/}
}