RuibinZheng

1 publications

ICLR 2026 GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning Han Zhang, RuibinZheng, Zexuan Yi, Zhuo Zhang, Hanyang Peng, Hui Wang, Jiayin Qi, Binxing Fang, Ruifeng Xu, Yue Yu