Yuan, Weizhe
11 publications
ICLR
2026
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
Zhaoning Yu, Zhaolun Su, Leitian Tao, Haozhu Wang, Aashu Singh, Hanchao Yu, Jianyu Wang, Hongyang Gao, Weizhe Yuan, Jason E Weston, Ping Yu, Jing Xu NeurIPS
2025
NaturalReasoning: Reasoning in the Wild with 2.8m Challenging Questions
Weizhe Yuan, Jane Yu, Song Jiang, Karthik Padthe, Yang Li, Dong Wang, Ilia Kulikov, Kyunghyun Cho, Yuandong Tian, Jason E Weston, Xian Li