Braun, Joschka

1 publications

ICLRW 2025 Understanding (Un)Reliability of Steering Vectors in Language Models Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov