Variengien, Alexandre

6 publications

ICLR 2025 Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models Alexandre Variengien, Eric Winsor
ICMLW 2024 BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards Diego Dorn, Alexandre Variengien, Charbel-Raphael Segerie, Vincent Corruble
ICMLW 2024 Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models Alexandre Variengien, Eric Winsor
NeurIPS 2023 How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model Michael Hanna, Ollie Liu, Alexandre Variengien
ICLR 2023 Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
NeurIPSW 2022 Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt