ML Anthology
Authors
Search
About
Griffin, Charlie
6 publications
ICMLW
2024
Games for AI-Control: Models of Safety Evaluations of AI Deployment Protocols
Charlie Griffin
,
Buck Shlegeris
,
Alessandro Abate
ICLR
2024
Goodhart's Law in Reinforcement Learning
Jacek Karwowski
,
Oliver Hayman
,
Xingjian Bai
,
Klaus Kiendlhofer
,
Charlie Griffin
,
Joar Max Viktor Skalse
ICLR
2024
On the Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Rohan Subramani
,
Marcus Williams
,
Max Heitmann
,
Halfdan Holm
,
Charlie Griffin
,
Joar Max Viktor Skalse
NeurIPSW
2023
Reinforcement Learning Fine-Tuning of Language Models Is Biased Towards More Extractable Features
Diogo Cruz
,
Edoardo Pona
,
Alex Holness-Tofts
,
Elias Schmied
,
Víctor Abia Alonso
,
Charlie Griffin
,
Bogdan-Ionut Cirstea
NeurIPSW
2022
All’s Well That Ends Well: Avoiding Side Effects with Distance-Impact Penalties
Charlie Griffin
,
Joar Max Viktor Skalse
,
Lewis Hammond
,
Alessandro Abate
IJCAI
2022
Lexicographic Multi-Objective Reinforcement Learning
Joar Skalse
,
Lewis Hammond
,
Charlie Griffin
,
Alessandro Abate