Dong, Hande

2 publications

ICLR 2026 Scheduling Your LLM Reinforcement Learning with Reasoning Trees Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Cheaterlin, Hande Dong, Jiawei Chen
NeurIPS 2025 ReDit: Reward Dithering for Improved LLM Policy Optimization Chenxing Wei, Jiarui Yu, Ying Tiffany He, Hande Dong, Yao Shu, Fei Yu