ML Anthology
Authors
Search
About
Cirstea, Bogdan-Ionut
2 publications
NeurIPSW
2024
Inducing Human-like Biases in Moral Reasoning Language Models
Austin Meek
,
Artem Karpov
,
Seong Hah Cho
,
Raymond Koopmanschap
,
Lucy Farnik
,
Bogdan-Ionut Cirstea
NeurIPSW
2023
Reinforcement Learning Fine-Tuning of Language Models Is Biased Towards More Extractable Features
Diogo Cruz
,
Edoardo Pona
,
Alex Holness-Tofts
,
Elias Schmied
,
VĂctor Abia Alonso
,
Charlie Griffin
,
Bogdan-Ionut Cirstea