ML Anthology
Authors
Search
About
Ri, Narutatsu
2 publications
ICML
2025
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Yik Siu Chan
,
Narutatsu Ri
,
Yuxin Xiao
,
Marzyeh Ghassemi
ICML
2024
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
,
Ruiqi Zhong
,
Narutatsu Ri
,
Chen Zhao
,
He He
,
Jacob Steinhardt
,
Zhou Yu
,
Kathleen Mckeown