Svegliato, Justin

12 publications

ICML 2025 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICLRW 2025 Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
NeurIPS 2024 A StrongREJECT for Empty Jailbreaks Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
ICLRW 2024 A StrongREJECT for Empty Jailbreaks Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
ICMLW 2024 AssistanceZero: Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICMLW 2024 Scalably Solving Assistance Games Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, Anca Dragan
ICLR 2024 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
NeurIPSW 2023 Mitigating Generative Agent Social Dilemmas Julian Yocum, Phillip J.K. Christoffersen, Mehul Damani, Justin Svegliato, Dylan Hadfield-Menell, Stuart Russell
NeurIPSW 2023 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
NeurIPSW 2023 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
AAAI 2021 Ethically Compliant Sequential Decision Making Justin Svegliato, Samer B. Nashed, Shlomo Zilberstein
IJCAI 2018 Meta-Level Control of Anytime Algorithms with Online Performance Prediction Justin Svegliato, Kyle Hollins Wray, Shlomo Zilberstein