Qi, Siya

1 publications

ICLR 2026 When Thinking Backfires: Mechanistic Insights into Reason-Induced Misalignment Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He