Wehner, Jan

3 publications

ICLRW 2025 Safety Is Essential for Responsible Open-Ended Systems Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz
TMLR 2025 Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
NeurIPS 2024 Representation Noising: A Defence Mechanism Against Harmful Finetuning Domenic Rosati, Jan Wehner, Kai Williams, Ɓukasz Bartoszcze, David Atanasov, Robie Gonzales, Subhabrata Majumdar, Carsten Maple, Hassan Sajjad, Frank Rudzicz