Toyer, Sam

10 publications

NeurIPS 2024 A StrongREJECT for Empty Jailbreaks Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
ICLRW 2024 A StrongREJECT for Empty Jailbreaks Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
NeurIPSW 2024 Exploring and Addressing Reward Confusion in Offline Preference Learning Xin Chen, Sam Toyer, Florian Shkurti
ICLR 2024 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
NeurIPSW 2023 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
NeurIPSW 2023 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
JAIR 2020 ASNets: Deep Learning for Generalised Planning Sam Toyer, Sylvie Thiébaux, Felipe W. Trevizan, Lexing Xie
NeurIPS 2020 The MAGICAL Benchmark for Robust Imitation Sam Toyer, Rohin Shah, Andrew Critch, Stuart J. Russell
ICLR 2019 Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine
AAAI 2018 Action Schema Networks: Generalised Policies with Deep Learning Sam Toyer, Felipe W. Trevizan, Sylvie Thiébaux, Lexing Xie