Lieberum, Tom

5 publications

NeurIPS 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

ICMLW 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, Janos Kramar, Rohin Shah, Neel Nanda

ICLR 2023 Progress Measures for Grokking via Mechanistic Interpretability Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler

NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler