Zhang, Hanning

3 publications

TMLR 2025 Entropy-Regularized Process Reward Model Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang
NeurIPS 2025 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang
ICML 2025 Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods Yifan Hao, Xingyuan Pan, Hanning Zhang, Chenlu Ye, Rui Pan, Tong Zhang