Near Optimal Adversarial Attack on UCB Bandits
Abstract
I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. I propose a novel attack strategy that manipulates a learner employing the upper-confidence-bound (UCB) algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.
Cite
Text
Zuo. "Near Optimal Adversarial Attack on UCB Bandits." ICML 2023 Workshops: AdvML-Frontiers, 2023.Markdown
[Zuo. "Near Optimal Adversarial Attack on UCB Bandits." ICML 2023 Workshops: AdvML-Frontiers, 2023.](https://mlanthology.org/icmlw/2023/zuo2023icmlw-near/)BibTeX
@inproceedings{zuo2023icmlw-near,
title = {{Near Optimal Adversarial Attack on UCB Bandits}},
author = {Zuo, Shiliang},
booktitle = {ICML 2023 Workshops: AdvML-Frontiers},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/zuo2023icmlw-near/}
}