ML Anthology
Authors
Search
About
Woodside, Thomas
1 publications
ICML
2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
Alexander Pan
,
Jun Shern Chan
,
Andy Zou
,
Nathaniel Li
,
Steven Basart
,
Thomas Woodside
,
Hanlin Zhang
,
Scott Emmons
,
Dan Hendrycks