Mu, Jesse

12 publications

NeurIPSW 2024 Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Tong Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir N Shavit, Ethan Perez
NeurIPSW 2024 Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Tong Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir N Shavit, Ethan Perez
NeurIPS 2024 Many-Shot Jailbreaking Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Fracesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan J Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, James Sully, Alex Tamkin, Tamera Lanhan, Karina Nguyen, Tomasz Korbak, Jared Kaplan, Deep Ganguli, Samuel R. Bowman, Ethan Perez, Roger Baker Grosse, David Duvenaud
NeurIPS 2023 Learning to Compress Prompts with Gist Tokens Jesse Mu, Xiang Li, Noah Goodman
NeurIPS 2022 Active Learning Helps Pretrained Models Learn the Intended Task Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman
ICLRW 2022 Emergent Covert Signaling in Adversarial Reference Games Dhara Yu, Jesse Mu, Noah Goodman
NeurIPS 2022 Improving Intrinsic Exploration with Language Abstractions Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
NeurIPS 2022 Improving Policy Learning via Language Dynamics Distillation Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel
NeurIPSW 2022 In the ZONE: Measuring Difficulty and Progression in Curriculum Generation Rose E Wang, Jesse Mu, Dilip Arumugam, Natasha Jaques, Noah Goodman
NeurIPS 2022 STaR: Bootstrapping Reasoning with Reasoning Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah Goodman
NeurIPS 2021 Emergent Communication of Generalizations Jesse Mu, Noah Goodman
NeurIPS 2020 Compositional Explanations of Neurons Jesse Mu, Jacob Andreas