Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations

Shi, Kexin; Wang, Wenjia; Jing, Bingyi

doi:10.1007/978-3-032-06096-9_29

Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations

Kexin Shi, Wenjia Wang, Bingyi Jing

ECML-PKDD 2025 pp. 502-518

doi:10.1007/978-3-032-06096-9_29 /ecmlpkdd/2025/shi2025ecmlpkdd-aggressive/

Abstract

Offline reinforcement learning has become a powerful tool for optimizing recommender systems by learning from logged user interactions. However, existing methods rely on conservative exploration, limiting their ability to discover diverse and high-reward content. This paper introduces Bias-Reducing Aggressive Variance-Driven Exploration (BRAVE), an uncertainty-aware exploration strategy that effectively balances exploration and exploitation while addressing data bias to some extent in recommender systems. Unlike traditional offline RL methods that penalize uncertainty, BRAVE leverages uncertainty as a positive signal, guiding the agent toward underrepresented yet potentially high-reward recommendations. We evaluate BRAVE on KuaiRec, KuaiRand, and Yahoo datasets, demonstrating its effectiveness in prolonging user interaction and identifying highly relevant items, leading to improved user satisfaction. Moreover, BRAVE’s strong performance on biased datasets underscores the potential of aggressive exploration in offline RL, providing a novel approach to breaking filter bubbles and reducing bias in recommender systems.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Shi et al. "Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06096-9_29

Markdown

[Shi et al. "Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/shi2025ecmlpkdd-aggressive/) doi:10.1007/978-3-032-06096-9_29

BibTeX

@inproceedings{shi2025ecmlpkdd-aggressive,
  title     = {{Aggressive Exploration in Offline Reinforcement Learning for Better Recommendations}},
  author    = {Shi, Kexin and Wang, Wenjia and Jing, Bingyi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {502-518},
  doi       = {10.1007/978-3-032-06096-9_29},
  url       = {https://mlanthology.org/ecmlpkdd/2025/shi2025ecmlpkdd-aggressive/}
}