Khalaf, Hadi

1 publications

NeurIPS 2025 Inference-Time Reward Hacking in Large Language Models Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling, Himabindu Lakkaraju, Flavio Calmon