A Relative Exponential Weighing Algorithm for Adversarial Utility-Based Dueling Bandits
Abstract
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
Cite
Text
Gajane et al. "A Relative Exponential Weighing Algorithm for Adversarial Utility-Based Dueling Bandits." International Conference on Machine Learning, 2015.Markdown
[Gajane et al. "A Relative Exponential Weighing Algorithm for Adversarial Utility-Based Dueling Bandits." International Conference on Machine Learning, 2015.](https://mlanthology.org/icml/2015/gajane2015icml-relative/)BibTeX
@inproceedings{gajane2015icml-relative,
title = {{A Relative Exponential Weighing Algorithm for Adversarial Utility-Based Dueling Bandits}},
author = {Gajane, Pratik and Urvoy, Tanguy and Clérot, Fabrice},
booktitle = {International Conference on Machine Learning},
year = {2015},
pages = {218-227},
volume = {37},
url = {https://mlanthology.org/icml/2015/gajane2015icml-relative/}
}