Dong, Hande

1 publications

NeurIPS 2025 ReDit: Reward Dithering for Improved LLM Policy Optimization Chenxing Wei, Jiarui Yu, Ying Tiffany He, Hande Dong, Yao Shu, Fei Yu