ML Anthology
Authors
Search
About
Singhal, Shivam
4 publications
ICLR
2025
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Cassidy Laidlaw
,
Shivam Singhal
,
Anca Dragan
ICMLW
2024
Scalable Oversight by Accounting for Unreliable Feedback
Shivam Singhal
,
Cassidy Laidlaw
,
Anca Dragan
ICMLW
2023
Preventing Reward Hacking with Occupancy Measure Regularization
Cassidy Laidlaw
,
Shivam Singhal
,
Anca Dragan
ICMLW
2023
Preventing Reward Hacking with Occupancy Measure Regularization
Cassidy Laidlaw
,
Shivam Singhal
,
Anca Dragan