ML Anthology
Authors
Search
About
Variengien, Alexandre
6 publications
ICLR
2025
Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models
Alexandre Variengien
,
Eric Winsor
ICMLW
2024
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards
Diego Dorn
,
Alexandre Variengien
,
Charbel-Raphael Segerie
,
Vincent Corruble
ICMLW
2024
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien
,
Eric Winsor
NeurIPS
2023
How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model
Michael Hanna
,
Ollie Liu
,
Alexandre Variengien
ICLR
2023
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Kevin Ro Wang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt
NeurIPSW
2022
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Kevin Ro Wang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt