Gao, Jiaxuan

11 publications

NeurIPS 2025 AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Wang Jiashu, Tongkai Yang, Binhang Yuan, Yi Wu
NeurIPS 2025 How Far Are We from Optimal Reasoning Efficiency? Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu
NeurIPS 2025 Reasoning Is Not a Race: When Stopping Early Beats Going Deeper Mohan Zhang, Jiaxuan Gao, Shusheng Xu, Yi Wu
ICML 2024 Is DPO Superior to PPO for LLM Alignment? a Comprehensive Study Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
ICLR 2024 SRL: Scaling Distributed Reinforcement Learning to over Ten Thousand Cores Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu
NeurIPSW 2024 Sharing Minds During MARL Training for Enhanced Cooperative LLM Agents Jiaxuan Gao, Yule Wen, Chao Yu, Yi Wu
ICLR 2023 Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased Chao Yu, Jiaxuan Gao, Weilin Liu, Botian Xu, Hao Tang, Jiaqi Yang, Yu Wang, Yi Wu
ICLR 2023 SpeedyZero: Mastering Atari with Limited Data and Time Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
ECCV 2022 Learning Efficient Multi-Agent Cooperative Visual Exploration Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu
NeurIPS 2022 The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu
NeurIPSW 2021 Learning Efficient Multi-Agent Cooperative Visual Exploration Chao Yu, Xinyi Yang, Jiaxuan Gao, Huazhong Yang, Yu Wang, Yi Wu