Dumbalska, Tsvetomira

1 publications

ICLR 2026 Reward Models Inherit Value Biases from Pretraining Brian Christian, Jessica A F Thompson, Elle, Vincent Adam, Hannah Rose Kirk, Christopher Summerfield, Tsvetomira Dumbalska