ML Anthology
Authors
Search
About
Gleave, Adam
20 publications
AAAI
2025
Can Go AIs Be Adversarially Robust?
Tom Tseng
,
Euan McLean
,
Kellin Pelrine
,
Tony Tong Wang
,
Adam Gleave
NeurIPS
2025
Preference Learning with Lie Detectors Can Induce Honesty or Evasion
Chris Cundy
,
Adam Gleave
AAAI
2025
Scaling Trends for Data Poisoning in LLMs
Dillon Bowen
,
Brendan Murphy
,
Will Cai
,
David Khachaturov
,
Adam Gleave
,
Kellin Pelrine
ICML
2025
Scaling Trends in Language Model Robustness
Nikolaus H. R. Howe
,
Ian R. Mckenzie
,
Oskar John Hollinsworth
,
Michał Zając
,
Tom Tseng
,
Aaron David Tucker
,
Pierre-Luc Bacon
,
Adam Gleave
ICMLW
2024
Can Go AIs Be Adversarially Robust?
Tom Tseng
,
Euan McLean
,
Kellin Pelrine
,
Tony Tong Wang
,
Adam Gleave
ICMLW
2024
Exploring Scaling Trends in LLM Robustness
Nikolaus H. R. Howe
,
Michał Zając
,
Ian R. McKenzie
,
Oskar John Hollinsworth
,
Pierre-Luc Bacon
,
Adam Gleave
ICMLW
2024
Planning Behavior in a Recurrent Neural Network That Plays Sokoban
Adrià Garriga-Alonso
,
Mohammad Taufeeque
,
Adam Gleave
ICLR
2024
STARC: A General Framework for Quantifying Differences Between Reward Functions
Joar Max Viktor Skalse
,
Lucy Farnik
,
Sumeet Ramesh Motwani
,
Erik Jenner
,
Adam Gleave
,
Alessandro Abate
ICML
2023
Adversarial Policies Beat Superhuman Go AIs
Tony Tong Wang
,
Adam Gleave
,
Tom Tseng
,
Kellin Pelrine
,
Nora Belrose
,
Joseph Miller
,
Michael D Dennis
,
Yawen Duan
,
Viktor Pogrebniak
,
Sergey Levine
,
Stuart Russell
ICML
2023
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning
Joar Max Viktor Skalse
,
Matthew Farrugia-Roberts
,
Stuart Russell
,
Alessandro Abate
,
Adam Gleave
NeurIPSW
2022
A General Framework for Reward Function Distances
Erik Jenner
,
Joar Max Viktor Skalse
,
Adam Gleave
NeurIPSW
2022
Adversarial Policies Beat Professional-Level Go AIs
Tony Tong Wang
,
Adam Gleave
,
Nora Belrose
,
Tom Tseng
,
Michael D Dennis
,
Yawen Duan
,
Viktor Pogrebniak
,
Joseph Miller
,
Sergey Levine
,
Stuart Russell
NeurIPSW
2022
Adversarial Policies Beat Professional-Level Go AIs
Tony Tong Wang
,
Adam Gleave
,
Nora Belrose
,
Tom Tseng
,
Michael D Dennis
,
Yawen Duan
,
Viktor Pogrebniak
,
Sergey Levine
,
Stuart Russell
NeurIPSW
2022
On the Fragility of Learned Reward Functions
Lev E. McKinney
,
Yawen Duan
,
David Krueger
,
Adam Gleave
NeurIPSW
2022
On the Fragility of Learned Reward Functions
Lev E McKinney
,
Yawen Duan
,
David Krueger
,
Adam Gleave
NeurIPSW
2022
On the Fragility of Learned Reward Functions
Lev E McKinney
,
Yawen Duan
,
David Krueger
,
Adam Gleave
NeurIPSW
2022
Training Equilibria in Reinforcement Learning
Lauro Langosco
,
David Krueger
,
Adam Gleave
ICLR
2021
Quantifying Differences in Reward Functions
Adam Gleave
,
Michael D Dennis
,
Shane Legg
,
Stuart Russell
,
Jan Leike
MLOSS
2021
Stable-Baselines3: Reliable Reinforcement Learning Implementations
Antonin Raffin
,
Ashley Hill
,
Adam Gleave
,
Anssi Kanervisto
,
Maximilian Ernestus
,
Noah Dormann
ICLR
2020
Adversarial Policies: Attacking Deep Reinforcement Learning
Adam Gleave
,
Michael Dennis
,
Cody Wild
,
Neel Kant
,
Sergey Levine
,
Stuart Russell