ML Anthology
Authors
Search
About
Velja, Joan
4 publications
TMLR
2024
'Explaining RL Decisions with Trajectories’: A Reproducibility Study
Karim Ahmed Abdel Sadek
,
Matteo Nulli
,
Joan Velja
,
Jort Vincenti
NeurIPSW
2024
Emergence of Steganography Between Large Language Models
Yohan Mathew
,
Robert McCarthy
,
Joan Velja
,
Ollie Matthews
,
Nandi Schoots
,
Dylan Cope
NeurIPSW
2024
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew
,
Ollie Matthews
,
Robert McCarthy
,
Joan Velja
,
Christian Schroeder de Witt
,
Dylan Cope
,
Nandi Schoots
NeurIPSW
2024
Steganography in Large Language Models: Investigating Emergence and Mitigations
Yohan Mathew
,
Robert McCarthy
,
Ollie Matthews
,
Joan Velja
,
Nandi Schoots
,
Dylan Cope