Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Abstract

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

Cite

Text

Sakhi et al. "Fast Slate Policy Optimization: Going Beyond Plackett-Luce." Transactions on Machine Learning Research, 2023.

Markdown

[Sakhi et al. "Fast Slate Policy Optimization: Going Beyond Plackett-Luce." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/sakhi2023tmlr-fast/)

BibTeX

@article{sakhi2023tmlr-fast,
  title     = {{Fast Slate Policy Optimization: Going Beyond Plackett-Luce}},
  author    = {Sakhi, Otmane and Rohde, David and Chopin, Nicolas},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/sakhi2023tmlr-fast/}
}