Shao, Hui

1 publications

ICLR 2026 RiskPO: Risk-Based Policy Optimization with Verifiable Reward for LLM Post-Training Tao Ren, Jinyang Jiang, Hui Yang, Wan Tian, Minhao Zou, Guanghao Li, Zishi Zhang, Qinghao Wang, Shentao Qin, Yanjun Zhao, Rui Tao, Hui Shao, Yijie Peng