Gleave, Adam

20 publications

AAAI 2025 Can Go AIs Be Adversarially Robust? Tom Tseng, Euan McLean, Kellin Pelrine, Tony Tong Wang, Adam Gleave

NeurIPS 2025 Preference Learning with Lie Detectors Can Induce Honesty or Evasion Chris Cundy, Adam Gleave

AAAI 2025 Scaling Trends for Data Poisoning in LLMs Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine

ICML 2025 Scaling Trends in Language Model Robustness Nikolaus H. R. Howe, Ian R. Mckenzie, Oskar John Hollinsworth, Michał Zając, Tom Tseng, Aaron David Tucker, Pierre-Luc Bacon, Adam Gleave

ICMLW 2024 Can Go AIs Be Adversarially Robust? Tom Tseng, Euan McLean, Kellin Pelrine, Tony Tong Wang, Adam Gleave

ICMLW 2024 Exploring Scaling Trends in LLM Robustness Nikolaus H. R. Howe, Michał Zając, Ian R. McKenzie, Oskar John Hollinsworth, Pierre-Luc Bacon, Adam Gleave

ICMLW 2024 Planning Behavior in a Recurrent Neural Network That Plays Sokoban Adrià Garriga-Alonso, Mohammad Taufeeque, Adam Gleave

ICLR 2024 STARC: A General Framework for Quantifying Differences Between Reward Functions Joar Max Viktor Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

ICML 2023 Adversarial Policies Beat Superhuman Go AIs Tony Tong Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

ICML 2023 Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Joar Max Viktor Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave

NeurIPSW 2022 A General Framework for Reward Function Distances Erik Jenner, Joar Max Viktor Skalse, Adam Gleave

NeurIPSW 2022 Adversarial Policies Beat Professional-Level Go AIs Tony Tong Wang, Adam Gleave, Nora Belrose, Tom Tseng, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Joseph Miller, Sergey Levine, Stuart Russell

NeurIPSW 2022 Adversarial Policies Beat Professional-Level Go AIs Tony Tong Wang, Adam Gleave, Nora Belrose, Tom Tseng, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell

NeurIPSW 2022 On the Fragility of Learned Reward Functions Lev E. McKinney, Yawen Duan, David Krueger, Adam Gleave

NeurIPSW 2022 On the Fragility of Learned Reward Functions Lev E McKinney, Yawen Duan, David Krueger, Adam Gleave

NeurIPSW 2022 On the Fragility of Learned Reward Functions Lev E McKinney, Yawen Duan, David Krueger, Adam Gleave

NeurIPSW 2022 Training Equilibria in Reinforcement Learning Lauro Langosco, David Krueger, Adam Gleave

ICLR 2021 Quantifying Differences in Reward Functions Adam Gleave, Michael D Dennis, Shane Legg, Stuart Russell, Jan Leike

MLOSS 2021 Stable-Baselines3: Reliable Reinforcement Learning Implementations Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann

ICLR 2020 Adversarial Policies: Attacking Deep Reinforcement Learning Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell