ML Anthology
Authors
Search
About
Wehner, Jan
3 publications
ICLRW
2025
Safety Is Essential for Responsible Open-Ended Systems
Ivaxi Sheth
,
Jan Wehner
,
Sahar Abdelnabi
,
Ruta Binkyte
,
Mario Fritz
TMLR
2025
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Jan Wehner
,
Sahar Abdelnabi
,
Daniel Tan
,
David Krueger
,
Mario Fritz
NeurIPS
2024
Representation Noising: A Defence Mechanism Against Harmful Finetuning
Domenic Rosati
,
Jan Wehner
,
Kai Williams
,
Ćukasz Bartoszcze
,
David Atanasov
,
Robie Gonzales
,
Subhabrata Majumdar
,
Carsten Maple
,
Hassan Sajjad
,
Frank Rudzicz