SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Abstract

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

Cite

Text

Ie et al. "SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/360

Markdown

[Ie et al. "SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/ie2019ijcai-slateq/) doi:10.24963/IJCAI.2019/360

BibTeX

@inproceedings{ie2019ijcai-slateq,
  title     = {{SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets}},
  author    = {Ie, Eugene and Jain, Vihan and Wang, Jing and Narvekar, Sanmit and Agarwal, Ritesh and Wu, Rui and Cheng, Heng-Tze and Chandra, Tushar and Boutilier, Craig},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {2592-2599},
  doi       = {10.24963/IJCAI.2019/360},
  url       = {https://mlanthology.org/ijcai/2019/ie2019ijcai-slateq/}
}