Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
Abstract
Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the n-step regret of these algorithms. We also establish a lower bound on the regret for cascading non-stationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a real-world web search click dataset.
Cite
Text
Li and de Rijke. "Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/396Markdown
[Li and de Rijke. "Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/li2019ijcai-cascading/) doi:10.24963/IJCAI.2019/396BibTeX
@inproceedings{li2019ijcai-cascading,
title = {{Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model}},
author = {Li, Chang and de Rijke, Maarten},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2019},
pages = {2859-2865},
doi = {10.24963/IJCAI.2019/396},
url = {https://mlanthology.org/ijcai/2019/li2019ijcai-cascading/}
}