Mei, Zhiyu

6 publications

ICLR 2026 Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu
NeurIPS 2025 AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Wang Jiashu, Tongkai Yang, Binhang Yuan, Yi Wu
NeurIPS 2025 How Far Are We from Optimal Reasoning Efficiency? Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu
ICML 2024 Is DPO Superior to PPO for LLM Alignment? a Comprehensive Study Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
ICLR 2024 SRL: Scaling Distributed Reinforcement Learning to over Ten Thousand Cores Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu
ICMLW 2023 SRL: Scaling Distributed Reinforcement Learning to over Ten Thousand Cores Zhiyu Mei, Wei Fu, Guangju Wang, Huanchen Zhang, Yi Wu