Russell, Stuart
135 publications
ICML
2024
Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mosse, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker NeurIPSW
2024
Predicting Human Decisions with Behavioral Theories and Machine Learning
Ori Plonsky, Reut Apel, Eyal Ert, Moshe Tennenholtz, David Bourgin, Joshua Peterson, Daniel Reichman, Thomas L. Griffiths, Stuart Russell, Evan Carter, James F. Cavanagh, Ido Erev ICLR
2024
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell ICML
2023
Adversarial Policies Beat Superhuman Go AIs
Tony Tong Wang, Adam Gleave, Tom Tseng, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell NeurIPSW
2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell NeurIPSW
2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell