ML Anthology
Authors
Search
About
Zhao, Hanyang
7 publications
ICLR
2025
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen
,
Hanyang Zhao
,
Henry Lam
,
David Yao
,
Wenpin Tang
JAIR
2025
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Genta Indra Winata
,
Hanyang Zhao
,
Anirban Das
,
Wenpin Tang
,
David D. Yao
,
Shi-Xiong Zhang
,
Sambit Sahu
ICLR
2025
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao
,
Genta Indra Winata
,
Anirban Das
,
Shi-Xiong Zhang
,
David Yao
,
Wenpin Tang
,
Sambit Sahu
ICML
2025
Score as Action: Fine Tuning Diffusion Generative Models by Continuous-Time Reinforcement Learning
Hanyang Zhao
,
Haoxian Chen
,
Ji Zhang
,
David Yao
,
Wenpin Tang
ICLRW
2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-Time Reinforcement Learning
Hanyang Zhao
,
Haoxian Chen
,
Ji Zhang
,
David Yao
,
Wenpin Tang
NeurIPSW
2024
Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen
,
Hanyang Zhao
,
Henry Lam
,
David Yao
,
Wenpin Tang
NeurIPS
2023
Policy Optimization for Continuous Reinforcement Learning
Hanyang Zhao
,
Wenpin Tang
,
David Yao