Lieberum, Tom

5 publications

NeurIPS 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda
ICMLW 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, Janos Kramar, Rohin Shah, Neel Nanda
ICLR 2023 Progress Measures for Grokking via Mechanistic Interpretability Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler
NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler