Velja, Joan

4 publications

TMLR 2024 'Explaining RL Decisions with Trajectories’: A Reproducibility Study Karim Ahmed Abdel Sadek, Matteo Nulli, Joan Velja, Jort Vincenti
NeurIPSW 2024 Emergence of Steganography Between Large Language Models Yohan Mathew, Robert McCarthy, Joan Velja, Ollie Matthews, Nandi Schoots, Dylan Cope
NeurIPSW 2024 Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs Yohan Mathew, Ollie Matthews, Robert McCarthy, Joan Velja, Christian Schroeder de Witt, Dylan Cope, Nandi Schoots
NeurIPSW 2024 Steganography in Large Language Models: Investigating Emergence and Mitigations Yohan Mathew, Robert McCarthy, Ollie Matthews, Joan Velja, Nandi Schoots, Dylan Cope