Wang, Jesson

1 publications

ICLR 2026 JULI: Jailbreak Large Language Models by Self-Introspection Jesson Wang, Zhanhao Hu, David Wagner