ML Anthology
Authors
Search
About
Dorn, Diego
2 publications
ICMLW
2024
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
Diego Dorn
,
Alexandre Variengien
,
Charbel-Raphael Segerie
,
Vincent Corruble
NeurIPSW
2023
Goal Misgeneralization as Implicit Goal Conditioning
Diego Dorn
,
Neel Alex
,
David Krueger