Near Optimal Adversarial Attack on UCB Bandits

Abstract

I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. I propose a novel attack strategy that manipulates a learner employing the upper-confidence-bound (UCB) algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.

Cite

Text

Zuo. "Near Optimal Adversarial Attack on UCB Bandits." ICML 2023 Workshops: AdvML-Frontiers, 2023.

Markdown

[Zuo. "Near Optimal Adversarial Attack on UCB Bandits." ICML 2023 Workshops: AdvML-Frontiers, 2023.](https://mlanthology.org/icmlw/2023/zuo2023icmlw-near/)

BibTeX

@inproceedings{zuo2023icmlw-near,
  title     = {{Near Optimal Adversarial Attack on UCB Bandits}},
  author    = {Zuo, Shiliang},
  booktitle = {ICML 2023 Workshops: AdvML-Frontiers},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/zuo2023icmlw-near/}
}