Irving, Geoffrey

9 publications

ICLR 2026 Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs Kyle O'Brien, Stephen Casper, Quentin Gregory Anthony, Tomek Korbak, Robert Kirk, Xander Davies, Ishan Mishra, Geoffrey Irving, Yarin Gal, Stella Biderman
TMLR 2026 Open Technical Problems in Open-Weight AI Model Risk Management Stephen Casper, Kyle O'Brien, Shayne Longpre, Elizabeth Seger, Kevin Klyman, Rishi Bommasani, Aniruddha Nrusimha, Ilia Shumailov, Sören Mindermann, Steven Basart, Frank Rudzicz, Kellin Pelrine, Avijit Ghosh, Andrew Strait, Robert Kirk, Dan Hendrycks, Peter Henderson, J Zico Kolter, Geoffrey Irving, Yarin Gal, Yoshua Bengio, Dylan Hadfield-Menell
ICML 2024 Scalable AI Safety via Doubly-Efficient Debate Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras
ICMLW 2024 Scalable AI Safety via Doubly-Efficient Debate Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras
NeurIPS 2022 Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, Lisa Anne Hendricks
ICML 2022 Improving Language Models by Retrieving from Trillions of Tokens Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack Rae, Erich Elsen, Laurent Sifre
Distill 2019 AI Safety Needs Social Scientists Geoffrey Irving, Amanda Askell
NeurIPS 2018 Reward Learning from Human Preferences and Demonstrations in Atari Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei
NeurIPS 2016 DeepMath - Deep Sequence Models for Premise Selection Geoffrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Een, Francois Chollet, Josef Urban