Biddulph, Caleb

4 publications

ICLR 2026 Breaking Barriers: Do Reinforcement Post Training Gains Transfer to Unseen Domains? Chuxuan Hu, Yuxuan Zhu, Antony Kellermann, Caleb Biddulph, Suppakit Waiwitlikhit, Jason Benn, Daniel Kang
ICML 2025 MONA: Myopic Optimization with Non-Myopic Approval Can Mitigate Multi-Step Reward Hacking Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, Rohin Shah
UAI 2023 Bandits with Costly Reward Observations Aaron D. Tucker, Caleb Biddulph, Claire Wang, Thorsten Joachims
NeurIPSW 2022 Bandits with Costly Reward Observations Aaron David Tucker, Caleb Biddulph, Claire Wang, Thorsten Joachims