Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards
Abstract
Incrementality, which measures the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner \emph{without} knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most $\widetilde{O}(H^2\sqrt{T})$, which depends on the number of rounds $H$ and number of episodes $T$, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is \emph{mixed} and \emph{delayed}. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent interest.
Cite
Text
Varadaraja et al. "Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards." Neural Information Processing Systems, 2022.Markdown
[Varadaraja et al. "Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/varadaraja2022neurips-incrementality/)BibTeX
@inproceedings{varadaraja2022neurips-incrementality,
title = {{Incrementality Bidding via Reinforcement Learning Under Mixed and Delayed Rewards}},
author = {Varadaraja, Ashwinkumar Badanidiyuru and Feng, Zhe and Li, Tianxi and Xu, Haifeng},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/varadaraja2022neurips-incrementality/}
}