ML Anthology
Authors
Search
About
Farnik, Lucy
5 publications
ICML
2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
,
Tim Lawson
,
Conor Houghton
,
Laurence Aitchison
ICLR
2025
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
,
Lucy Farnik
,
Conor Houghton
,
Laurence Aitchison
NeurIPSW
2024
Inducing Human-like Biases in Moral Reasoning Language Models
Austin Meek
,
Artem Karpov
,
Seong Hah Cho
,
Raymond Koopmanschap
,
Lucy Farnik
,
Bogdan-Ionut Cirstea
NeurIPSW
2024
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
,
Lucy Farnik
,
Conor Houghton
,
Laurence Aitchison
ICLR
2024
STARC: A General Framework for Quantifying Differences Between Reward Functions
Joar Max Viktor Skalse
,
Lucy Farnik
,
Sumeet Ramesh Motwani
,
Erik Jenner
,
Adam Gleave
,
Alessandro Abate