Zhao, Hanyang

7 publications

ICLR 2025 MallowsPO: Fine-Tune Your LLM with Preference Dispersions Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang
JAIR 2025 Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Genta Indra Winata, Hanyang Zhao, Anirban Das, Wenpin Tang, David D. Yao, Shi-Xiong Zhang, Sambit Sahu
ICLR 2025 RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David Yao, Wenpin Tang, Sambit Sahu
ICML 2025 Score as Action: Fine Tuning Diffusion Generative Models by Continuous-Time Reinforcement Learning Hanyang Zhao, Haoxian Chen, Ji Zhang, David Yao, Wenpin Tang
ICLRW 2025 Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-Time Reinforcement Learning Hanyang Zhao, Haoxian Chen, Ji Zhang, David Yao, Wenpin Tang
NeurIPSW 2024 Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang
NeurIPS 2023 Policy Optimization for Continuous Reinforcement Learning Hanyang Zhao, Wenpin Tang, David Yao