AgentRecBench: Benchmarking LLM Agent-Based Personalized Recommender Systems

Abstract

The emergence of agentic recommender systems powered by Large Language Models (LLMs) represents a paradigm shift in personalized recommendations, leveraging LLMs’ advanced reasoning and role-playing capabilities to enable autonomous, adaptive decision-making. Unlike traditional recommendation approaches, agentic recommender systems can dynamically gather and interpret user-item interactions from complex environments, generating robust recommendation strategies that generalize across diverse scenarios. However, the field currently lacks standardized evaluation protocols to systematically assess these methods. To address this critical gap, we propose: (1) an interactive textual recommendation simulator incorporating rich user and item metadata and three typical evaluation scenarios (classic, evolving-interest, and cold-start recommendation tasks); (2) a unified modular framework for developing agentic recommender systems; and (3) the first comprehensive benchmark comparing over 10 classical and agentic recommendation methods. Our findings demonstrate the superiority of agentic systems and establish actionable design guidelines for their core components. The benchmark environment has been rigorously validated through an open challenge and remains publicly available with a maintained leaderboard at https://tsinghua-fib-lab.github.io/AgentSocietyChallenge/pages/overview.html. The benchmark is available at: https://huggingface.co/datasets/SGJQovo/AgentRecBench.

Cite

Text

Shang et al. "AgentRecBench: Benchmarking LLM Agent-Based Personalized Recommender Systems." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shang et al. "AgentRecBench: Benchmarking LLM Agent-Based Personalized Recommender Systems." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shang2025neurips-agentrecbench/)

BibTeX

@inproceedings{shang2025neurips-agentrecbench,
  title     = {{AgentRecBench: Benchmarking LLM Agent-Based Personalized Recommender Systems}},
  author    = {Shang, Yu and Liu, Peijie and Yan, Yuwei and Wu, Zijing and Sheng, Leheng and Yu, Yuanqing and Jiang, Chumeng and Zhang, An and Xu, Fengli and Wang, Yu and Zhang, Min and Li, Yong},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shang2025neurips-agentrecbench/}
}