Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

Abstract

We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into d components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest d rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder nonoblivious setting. We show nonoblivious setting incurs Omega(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K armed bandit and bandit convex optimization, our policy regret bound is in the order of T to the two third. We also prove a matching lower bound for K armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is nonoblivious. It answers the open problem proposed in [Wang, Wang, Huang 2021], showing that nonoblivious delay is enough to incur the regret in the order of T to the two third.

Cite

Text

Wan et al. "Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/486

Markdown

[Wan et al. "Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/wan2022ijcai-bounded/) doi:10.24963/IJCAI.2022/486

BibTeX

@inproceedings{wan2022ijcai-bounded,
  title     = {{Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback}},
  author    = {Wan, Zongqi and Sun, Xiaoming and Zhang, Jialin},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3501-3507},
  doi       = {10.24963/IJCAI.2022/486},
  url       = {https://mlanthology.org/ijcai/2022/wan2022ijcai-bounded/}
}