Gaussian Process Bandits for Top-K Recommendations
Abstract
Algorithms that utilize bandit feedback to optimize top-k recommendations are vital for online marketplaces, search engines, and content platforms. However, the combinatorial nature of this problem poses a significant challenge, as the possible number of ordered top-k recommendations from $n$ items grows exponentially with $k$. As a result, previous work often relies on restrictive assumptions about the reward or bandit feedback models, such as assuming that the feedback discloses rewards for each recommended item rather than a single scalar feedback for the entire set of top-k recommendations. We introduce a novel contextual bandit algorithm for top-k recommendations, leveraging a Gaussian process with a Kendall kernel to model the reward function.Our algorithm requires only scalar feedback from the top-k recommendations and does not impose restrictive assumptions on the reward structure. Theoretical analysis confirms that the proposed algorithm achieves sub-linear regret in relation to the number of rounds and arms. Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.
Cite
Text
Yadav et al. "Gaussian Process Bandits for Top-K Recommendations." Neural Information Processing Systems, 2024. doi:10.52202/079017-2377Markdown
[Yadav et al. "Gaussian Process Bandits for Top-K Recommendations." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yadav2024neurips-gaussian/) doi:10.52202/079017-2377BibTeX
@inproceedings{yadav2024neurips-gaussian,
title = {{Gaussian Process Bandits for Top-K Recommendations}},
author = {Yadav, Mohit and Sheldon, Daniel and Musco, Cameron},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-2377},
url = {https://mlanthology.org/neurips/2024/yadav2024neurips-gaussian/}
}