Shah, Rusheb

2 publications

NeurIPSW 2023 Linearly Structured World Representations in Maze-Solving Transformers Michael Ivanitskiy, Alexander F Spies, Tilman Räuker, Guillaume Corlouer, Christopher Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung
NeurIPSW 2023 Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation Rusheb Shah, Quentin Feuillade Montixi, Soroush Pour, Arush Tagade, Javier Rando