Singhal, Shivam

4 publications

ICLR 2025 Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking Cassidy Laidlaw, Shivam Singhal, Anca Dragan
ICMLW 2024 Scalable Oversight by Accounting for Unreliable Feedback Shivam Singhal, Cassidy Laidlaw, Anca Dragan
ICMLW 2023 Preventing Reward Hacking with Occupancy Measure Regularization Cassidy Laidlaw, Shivam Singhal, Anca Dragan
ICMLW 2023 Preventing Reward Hacking with Occupancy Measure Regularization Cassidy Laidlaw, Shivam Singhal, Anca Dragan