Ri, Narutatsu

2 publications

ICML 2025 Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi

ICML 2024 Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen Mckeown