Optimal Large-Scale Stochastic Optimization of NDCG Surrogates for Deep Learning

Qiu, Zi-Hao; Hu, Quanqi; Zhong, Yongjian; Tu, Wei-Wei; Zhang, Lijun; Yang, Tianbao

doi:10.1007/S10994-024-06631-X

Optimal Large-Scale Stochastic Optimization of NDCG Surrogates for Deep Learning

Zi-Hao Qiu, Quanqi Hu, Yongjian Zhong, Wei-Wei Tu, Lijun Zhang, Tianbao Yang

MLJ 2025 pp. 42

doi:10.1007/S10994-024-06631-X /mlj/2025/qiu2025mlj-optimal/

Abstract

In this paper, we introduce principled stochastic algorithms to efficiently optimize Normalized Discounted Cumulative Gain (NDCG) and its top- K variant for deep models. To this end, we first propose novel compositional and bilevel compositional objectives for optimizing NDCG and top- K NDCG, respectively. We then develop two stochastic algorithms to tackle these non-convex objectives, achieving an iteration complexity of $\mathcal {O}(\epsilon ^{-4})$ O ( ϵ - 4 ) for reaching an $\epsilon $ ϵ -stationary point. Our methods employ moving average estimators to track the crucial inner functions for gradient computation, effectively reducing approximation errors. Besides, we introduce practical strategies such as initial warm-up and stop-gradient techniques to enhance performance in deep learning. Despite the advancements, the iteration complexity of these two algorithms does not meet the optimal $\mathcal {O}(\epsilon ^{-3})$ O ( ϵ - 3 ) for smooth non-convex optimization. To address this issue, we incorporate variance reduction techniques in our framework to more finely estimate the key functions, design new algorithmic mechanisms for solving multiple lower-level problems with parallel speed-up, and propose two types of algorithms. The first type directly tracks these functions with the variance reduced estimators, while the second treats these functions as solutions to minimization problems and employs variance reduced estimators to construct gradient estimators for solving these problems. We manage to establish the optimal $\mathcal {O}(\epsilon ^{-3})$ O ( ϵ - 3 ) complexity for both types of algorithms. It is important to highlight that our algorithmic frameworks are versatile and can optimize a wide spectrum of metrics, including Precision@ K /Recall@ K , Average Precision (AP), mean Average Precision (mAP), and their top- K variants. We further present efficient stochastic algorithms for optimizing these metrics with convergence guarantees. We conduct comprehensive experiments on multiple ranking tasks to verify the effectiveness of our proposed algorithms, which consistently surpass existing strong baselines.

PDF MLJ Semantic Scholar

Cite

Text

Qiu et al. "Optimal Large-Scale Stochastic Optimization of NDCG Surrogates for Deep Learning." Machine Learning, 2025. doi:10.1007/S10994-024-06631-X

Markdown

[Qiu et al. "Optimal Large-Scale Stochastic Optimization of NDCG Surrogates for Deep Learning." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/qiu2025mlj-optimal/) doi:10.1007/S10994-024-06631-X

BibTeX

@article{qiu2025mlj-optimal,
  title     = {{Optimal Large-Scale Stochastic Optimization of NDCG Surrogates for Deep Learning}},
  author    = {Qiu, Zi-Hao and Hu, Quanqi and Zhong, Yongjian and Tu, Wei-Wei and Zhang, Lijun and Yang, Tianbao},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {42},
  doi       = {10.1007/S10994-024-06631-X},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/qiu2025mlj-optimal/}
}