Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Abstract
In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers.
Cite
Text
Xu et al. "Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising." Neural Information Processing Systems, 2013.Markdown
[Xu et al. "Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/xu2013neurips-estimation/)BibTeX
@inproceedings{xu2013neurips-estimation,
title = {{Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising}},
author = {Xu, Min and Qin, Tao and Liu, Tie-Yan},
booktitle = {Neural Information Processing Systems},
year = {2013},
pages = {2400-2408},
url = {https://mlanthology.org/neurips/2013/xu2013neurips-estimation/}
}