Edelman, Benjamin L.

9 publications

ICML 2024 Distinguishing the Knowable from the Unknowable with Language Models Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman
ICLR 2024 Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham M. Kakade
TMLR 2024 Foundational Challenges in Assuring Alignment and Safety of Large Language Models Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Sean O hEigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwani, Yoshua Bengio, Danqi Chen, Philip Torr, Samuel Albanie, Tegan Maharaj, Jakob Nicolaus Foerster, Florian Tramèr, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
NeurIPS 2024 The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Ezra Edelman, Nikolaos Tsilivis, Benjamin L. Edelman, Eran Malach, Surbhi Goel
NeurIPSW 2024 The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Ezra Edelman, Nikolaos Tsilivis, Surbhi Goel, Benjamin L. Edelman, Eran Malach
NeurIPS 2024 Transcendence: Generative Models Can Outperform the Experts That Train Them Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, Benjamin L. Edelman, Milind Tambe, Sham Kakade, Eran Malach
ICLRW 2024 Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak
ICML 2024 Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, Boaz Barak
ICML 2022 Inductive Biases and Variable Creation in Self-Attention Mechanisms Benjamin L Edelman, Surbhi Goel, Sham Kakade, Cyril Zhang