Su, Zhaolun
1 publications
ICLR
2026
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
Zhaoning Yu, Zhaolun Su, Leitian Tao, Haozhu Wang, Aashu Singh, Hanchao Yu, Jianyu Wang, Hongyang Gao, Weizhe Yuan, Jason E Weston, Ping Yu, Jing Xu