ML Anthology
Authors
Search
About
Gurnee, Wes
10 publications
ICLR
2025
Not All Language Model Features Are One-Dimensionally Linear
Joshua Engels
,
Eric J Michaud
,
Isaac Liao
,
Wes Gurnee
,
Max Tegmark
NeurIPS
2025
Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
,
Jin Hwa Lee
,
Wes Gurnee
,
Max Tegmark
NeurIPS
2024
Confidence Regulation Neurons in Language Models
Alessandro Stolfo
,
Ben Wu
,
Wes Gurnee
,
Yonatan Belinkov
,
Xingyi Song
,
Mrinmaya Sachan
,
Neel Nanda
ICMLW
2024
Confidence Regulation Neurons in Language Models
Alessandro Stolfo
,
Ben Peng Wu
,
Wes Gurnee
,
Yonatan Belinkov
,
Xingyi Song
,
Mrinmaya Sachan
,
Neel Nanda
ICLR
2024
Language Models Represent Space and Time
Wes Gurnee
,
Max Tegmark
NeurIPS
2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
,
Oscar Obeso
,
Aaquib Syed
,
Daniel Paleka
,
Nina Panickssery
,
Wes Gurnee
,
Neel Nanda
ICMLW
2024
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
,
Oscar Balcells Obeso
,
Aaquib Syed
,
Daniel Paleka
,
Nina Panickssery
,
Wes Gurnee
,
Neel Nanda
ICMLW
2024
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
,
Wes Gurnee
,
Max Tegmark
TMLR
2024
Universal Neurons in GPT2 Language Models
Wes Gurnee
,
Theo Horsley
,
Zifan Carl Guo
,
Tara Rezaei Kheirkhah
,
Qinyi Sun
,
Will Hathaway
,
Neel Nanda
,
Dimitris Bertsimas
TMLR
2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
,
Neel Nanda
,
Matthew Pauly
,
Katherine Harvey
,
Dmitrii Troitskii
,
Dimitris Bertsimas