Velja, Joan

4 publications

TMLR 2024 'Explaining RL Decisions with Trajectories’: A Reproducibility Study Karim Ahmed Abdel Sadek, Matteo Nulli, Joan Velja, Jort Vincenti

NeurIPSW 2024 Emergence of Steganography Between Large Language Models Yohan Mathew, Robert McCarthy, Joan Velja, Ollie Matthews, Nandi Schoots, Dylan Cope

NeurIPSW 2024 Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs Yohan Mathew, Ollie Matthews, Robert McCarthy, Joan Velja, Christian Schroeder de Witt, Dylan Cope, Nandi Schoots

NeurIPSW 2024 Steganography in Large Language Models: Investigating Emergence and Mitigations Yohan Mathew, Robert McCarthy, Ollie Matthews, Joan Velja, Nandi Schoots, Dylan Cope