ML Anthology
Authors
Search
About
Guo, Phillip
1 publications
NeurIPSW
2023
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching
James Campbell
,
Phillip Guo
,
Richard Ren