Ge, Xinmu

1 publications

ICLR 2026 Co-Rewarding: Stable Self-Supervised RL for Eliciting Reasoning in Large Language Models Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han