ML Anthology
Authors
Search
About
Lieberum, Tom
5 publications
NeurIPS
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
János Kramár
,
Rohin Shah
,
Neel Nanda
ICMLW
2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
Janos Kramar
,
Rohin Shah
,
Neel Nanda
ICLR
2023
Progress Measures for Grokking via Mechanistic Interpretability
Neel Nanda
,
Lawrence Chan
,
Tom Lieberum
,
Jess Smith
,
Jacob Steinhardt
NeurIPSW
2022
Investigating Causal Understanding in LLMs
Marius Hobbhahn
,
Tom Lieberum
,
David Seiler
NeurIPSW
2022
Investigating Causal Understanding in LLMs
Marius Hobbhahn
,
Tom Lieberum
,
David Seiler