Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking

Abstract

This paper tackles the problem of providing users with ranked lists of relevant search results, by incorporating contextual features of the users and search results, and learning how a user values multiple objectives. For example, to recommend a ranked list of hotels, an algorithm must learn which hotels are the right price for users, as well as how users vary in their weighting of price against the location. In our paper, we formulate the context-aware, multi-objective, ranking problem as a Multi-Objective Contextual Ranked Bandit (MOCR-B). To solve the MOCR-B problem, we present a novel algorithm, named Multi-Objective Utility-Upper Confidence Bound (MOU-UCB). The goal of MOU-UCB is to learn how to generate a ranked list of resources that maximizes the rewards in multiple objectives to give relevant search results. Our algorithm learns to predict rewards in multiple objectives based on contextual information (combining the Upper Confidence Bound algorithm for multi-armed contextual bandits with neural network embeddings), as well as learns how a user weights the multiple objectives. Our empirical results reveal that the ranked lists generated by MOU-UCB lead to better click-through rates, compared to approaches that do not learn the utility function over multiple reward objectives.

Cite

Text

Wanigasekara et al. "Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/532

Markdown

[Wanigasekara et al. "Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/wanigasekara2019ijcai-learning/) doi:10.24963/IJCAI.2019/532

BibTeX

@inproceedings{wanigasekara2019ijcai-learning,
  title     = {{Learning Multi-Objective Rewards and User Utility Function in Contextual Bandits for Personalized Ranking}},
  author    = {Wanigasekara, Nirandika and Liang, Yuxuan and Goh, Siong Thye and Liu, Ye and Williams, Joseph Jay and Rosenblum, David S.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {3835-3841},
  doi       = {10.24963/IJCAI.2019/532},
  url       = {https://mlanthology.org/ijcai/2019/wanigasekara2019ijcai-learning/}
}