ML Anthology
Authors
Search
About
Khalaf, Hadi
1 publications
NeurIPS
2025
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf
,
Claudio Mayrink Verdun
,
Alex Oesterling
,
Himabindu Lakkaraju
,
Flavio Calmon