Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations
Abstract
Offline reinforcement learning has become a powerful tool for optimizing recommender systems by learning from logged user interactions. However, existing methods rely on conservative exploration, limiting their ability to discover diverse and high-reward content. This paper introduces Bias-Reducing Aggressive Variance-Driven Exploration (BRAVE), an uncertainty-aware exploration strategy that effectively balances exploration and exploitation while addressing data bias to some extent in recommender systems. Unlike traditional offline RL methods that penalize uncertainty, BRAVE leverages uncertainty as a positive signal, guiding the agent toward underrepresented yet potentially high-reward recommendations. We evaluate BRAVE on KuaiRec, KuaiRand, and Yahoo datasets, demonstrating its effectiveness in prolonging user interaction and identifying highly relevant items, leading to improved user satisfaction. Moreover, BRAVE’s strong performance on biased datasets underscores the potential of aggressive exploration in offline RL, providing a novel approach to breaking filter bubbles and reducing bias in recommender systems.
Cite
Text
Shi et al. "Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06096-9_29Markdown
[Shi et al. "Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/shi2025ecmlpkdd-aggressive/) doi:10.1007/978-3-032-06096-9_29BibTeX
@inproceedings{shi2025ecmlpkdd-aggressive,
title = {{Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations}},
author = {Shi, Kexin and Wang, Wenjia and Jing, Bingyi},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2025},
pages = {502-518},
doi = {10.1007/978-3-032-06096-9_29},
url = {https://mlanthology.org/ecmlpkdd/2025/shi2025ecmlpkdd-aggressive/}
}