Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting

Abstract

We investigate the online nonsubmodular optimization with delayed feedback in the bandit setting, where the loss function is α-weakly DR-submodular and β-weakly DR-supermodular. Previous work has established an (α,β)-regret bound of O(nd^⅓T^⅔), where n is the dimensionality and d is the maximum delay. However, its regret bound relies on the maximum delay and is thus sensitive to irregular delays. Additionally, it couples the effects of delays and bandit feedback as its bound is the product of the delay term and the O(nT^⅔) regret bound in the bandit setting without delayed feedback. In this paper, we develop two algorithms to address these limitations, respectively. Firstly, we propose a novel method, namely DBGD-NF, which employs the one-point gradient estimator and utilizes all the available estimated gradients in each round to update the decision. It achieves a better O(nd̅^⅓T^⅔) regret bound, which is relevant to the average delay d̅ = 1/T ∑ₜ₌₁ᵀ dₜ

Cite

Text

Yang et al. "Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I20.35507

Markdown

[Yang et al. "Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/yang2025aaai-online/) doi:10.1609/AAAI.V39I20.35507

BibTeX

@inproceedings{yang2025aaai-online,
  title     = {{Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting}},
  author    = {Yang, Sifan and Wan, Yuanyu and Zhang, Lijun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {21992-22000},
  doi       = {10.1609/AAAI.V39I20.35507},
  url       = {https://mlanthology.org/aaai/2025/yang2025aaai-online/}
}