Stochastic Dueling Bandits with Adversarial Corruption

Abstract

The dueling bandits problem has received a lot of attention in recent years due to its applications in recommendation systems and information retrieval. However, due to the prevalence of malicious users in these systems, it is becoming increasingly important to design dueling bandit algorithms that are robust to corruptions introduced by these malicious users. In this paper we study dueling bandits in the presence of an adversary that can corrupt some of the feedback received by the learner. We propose an algorithm for this problem that is agnostic to the amount of corruption introduced by the adversary: its regret degrades gracefully with the amount of corruption, and in case of no corruption, it essentially matches the optimal regret bounds achievable in the purely stochastic dueling bandits setting.

Cite

Text

Agarwal et al. "Stochastic Dueling Bandits with Adversarial Corruption." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.

Markdown

[Agarwal et al. "Stochastic Dueling Bandits with Adversarial Corruption." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.](https://mlanthology.org/alt/2021/agarwal2021alt-stochastic-a/)

BibTeX

@inproceedings{agarwal2021alt-stochastic-a,
  title     = {{Stochastic Dueling Bandits with Adversarial Corruption}},
  author    = {Agarwal, Arpit and Agarwal, Shivani and Patil, Prathamesh},
  booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  year      = {2021},
  pages     = {217-248},
  volume    = {132},
  url       = {https://mlanthology.org/alt/2021/agarwal2021alt-stochastic-a/}
}