Su, Zhaolun

1 publications

ICLR 2026 RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization Zhaoning Yu, Zhaolun Su, Leitian Tao, Haozhu Wang, Aashu Singh, Hanchao Yu, Jianyu Wang, Hongyang Gao, Weizhe Yuan, Jason E Weston, Ping Yu, Jing Xu