Position-Based Multiple-Play Bandit Problem with Unknown Position Bias

Abstract

Motivated by online advertising, we study a multiple-play multi-armed bandit problem with position bias that involves several slots and the latter slots yield fewer rewards. We characterize the hardness of the problem by deriving an asymptotic regret bound. We propose the Permutation Minimum Empirical Divergence (PMED) algorithm and derive its asymptotically optimal regret bound. Because of the uncertainty of the position bias, the optimal algorithm for such a problem requires non-convex optimizations that are different from usual partial monitoring and semi-bandit problems. We propose a cutting-plane method and related bi-convex relaxation for these optimizations by using auxiliary variables.

Cite

Text

Komiyama et al. "Position-Based Multiple-Play Bandit Problem with Unknown Position Bias." Neural Information Processing Systems, 2017.

Markdown

[Komiyama et al. "Position-Based Multiple-Play Bandit Problem with Unknown Position Bias." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/komiyama2017neurips-positionbased/)

BibTeX

@inproceedings{komiyama2017neurips-positionbased,
  title     = {{Position-Based Multiple-Play Bandit Problem with Unknown Position Bias}},
  author    = {Komiyama, Junpei and Honda, Junya and Takeda, Akiko},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {4998-5008},
  url       = {https://mlanthology.org/neurips/2017/komiyama2017neurips-positionbased/}
}