Shao, Hui
1 publications
ICLR
2026
RiskPO: Risk-Based Policy Optimization with Verifiable Reward for LLM Post-Training
Tao Ren, Jinyang Jiang, Hui Yang, Wan Tian, Minhao Zou, Guanghao Li, Zishi Zhang, Qinghao Wang, Shentao Qin, Yanjun Zhao, Rui Tao, Hui Shao, Yijie Peng