Ivison, Hamish
7 publications
NeurIPSW
2024
Best Unpacking DPO and PPO: Disentangling Practices for Learning from Preference Feedback
NeurIPS
2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
NeurIPSW
2024
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning