Wang, Yizhong
11 publications
NeurIPSW
2024
Best Unpacking DPO and PPO: Disentangling Practices for Learning from Preference Feedback
NeurIPSW
2024
Personalized Soups: Personalized Large Language Model Alignment via Post-Hoc Parameter Merging
NeurIPS
2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback