ML Anthology
Authors
Search
About
Xhonneux, Sophie
6 publications
ICLRW
2025
A Generative Approach to LLM Harmfulness Detection with Red Flag Tokens
Sophie Xhonneux
,
David Dobre
,
Mehrnaz Mofakhami
,
Leo Schwinn
,
Gauthier Gidel
ICLR
2025
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
,
Shengyi Huang
,
Sophie Xhonneux
,
Arian Hosseini
,
Rishabh Agarwal
,
Aaron Courville
NeurIPS
2024
Efficient Adversarial Training in LLMs with Continuous Attacks
Sophie Xhonneux
,
Alessandro Sordoni
,
Stephan Günnemann
,
Gauthier Gidel
,
Leo Schwinn
NeurIPSW
2024
Faster, More Efficient RLHF Through Off-Policy Asynchronous Learning
Michael Noukhovitch
,
Shengyi Huang
,
Sophie Xhonneux
,
Arian Hosseini
,
Rishabh Agarwal
,
Aaron Courville
ICMLW
2024
In-Context Learning, Can It Break Safety?
Sophie Xhonneux
,
David Dobre
,
Michael Noukhovitch
,
Jian Tang
,
Gauthier Gidel
,
Dhanya Sridhar
NeurIPS
2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs Through the Embedding Space
Leo Schwinn
,
David Dobre
,
Sophie Xhonneux
,
Gauthier Gidel
,
Stephan Günnemann