Guo, Phillip

1 publications

NeurIPSW 2023 Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching James Campbell, Phillip Guo, Richard Ren