ML Anthology
Authors
Search
About
Thomas, Drake
2 publications
NeurIPS
2024
Catastrophic Goodhart: Regularizing RLHF with KL Divergence Does Not Mitigate Heavy-Tailed Reward Misspecification
Thomas Kwa
,
Drake Thomas
,
AdriĆ Garriga-Alonso
ICMLW
2024
Catastrophic Goodhart: Regularizing RLHF with KL Divergence Does Not Mitigate Heavy-Tailed Reward Misspecification
Thomas Kwa
,
Drake Thomas
,
AdriĆ Garriga-Alonso